The dialect conversion-based bufferization passes have been migrated to
One-Shot Bufferize about two years ago. To clean up the code base, this
commit removes the `scf-bufferize` pass, one of the few remaining parts
of the old infrastructure. Most bufferization passes have already been
removed.
Note for LLVM integration: If you depend on this pass, migrate to
One-Shot Bufferize or copy the pass to your codebase.
Adds a canonicalization pattern for scf.forall that replaces constant
induction variables with a constant index. There is a similar
canonicalization that completely removes constant induction variables
from the loop, but that pattern does not apply on foralls with mappings,
so this one is necessary for those cases.
---------
Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
When pipelining an `scf.for` with dynamic loop bounds, the epilogue
ramp-down must align with the prologue when num_stages >
total_iterations.
For example:
```
scf.for (0..ub) {
load(i)
add(i)
store(i)
}
```
When num_stages=3 the pipeline follows:
```
load(0) - add(0) - scf.for (0..ub-2) - store(ub-2)
load(1) - - add(ub-1) - store(ub-1)
```
The trailing `store(ub-2)`, `i=ub-2`, must align with the ramp-up for
`i=0` when `ub < num_stages-1`, so the index `i` should be `max(0,
ub-2)` and each subsequent index is an increment. The predicate must
also handle this scenario, so it becomes `predicate[0] =
total_iterations > epilogue_stage`.
For index type of induction variable, the indexing math is better
represented using affine ops such as `affine.delinearize_index`.
This also further demonstrates that some of these `affine` ops might
need to move to a different dialect. For one these ops only support
`IndexType` when they should be able to work with any integer type.
This change also includes some canonicalization patterns for
`affine.delinearize_index` operation to
1) Drop unit `basis` values
2) Remove the `delinearize_index` op when the `linear_index` is a loop
induction variable of a normalized loop and the `basis` is of size 1 and
is also the upper bound of the normalized loop.
---------
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
This fixes loop iteration count calculation if the step is
a negative value, where we should adjust the added
delta from `step-1` to `step+1` when doing the ceil div.
Current folding of one-trip count loop does not kick in with an empty
mapping. Enable this for empty mapping.
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
Previously the values in the peeled prologue that weren't treated with
the `predicateFn` were passed to the loop body without any other
predication. If those values are later used outside of the loop body,
they may be incorrect if the num iterations is smaller than num stages -
1. We need similar masking for those, as is done in the main loop body,
using already existing predicates.
Define SCF dialect patterns rotating `scf.while` loops leveraging
existing `mlir::scf::wrapWhileLoopInZeroTripCheck`. `forceCreateCheck`
is always `false` as the pattern would lead to an infinite recursion
otherwise.
This pattern rotates `scf.while` ops, mutating them from "while" loops to
"do-while" loops. A guard checking the condition for the first iteration
is inserted. Note this guard can be optimized away if the compiler can
prove the loop will be executed at least once.
Using this pattern, the following while loop:
```mlir
scf.while (%arg0 = %init) : (i32) -> i64 {
%val = .., %arg0 : i64
%cond = arith.cmpi .., %arg0 : i32
scf.condition(%cond) %val : i64
} do {
^bb0(%arg1: i64):
%next = .., %arg1 : i32
scf.yield %next : i32
}
```
Can be transformed into:
``` mlir
%pre_val = .., %init : i64
%pre_cond = arith.cmpi .., %init : i32
scf.if %pre_cond -> i64 {
%res = scf.while (%arg1 = %va0) : (i64) -> i64 {
// Original after block
%next = .., %arg1 : i32
// Original before block
%val = .., %next : i64
%cond = arith.cmpi .., %next : i32
scf.condition(%cond) %val : i64
} do {
^bb0(%arg2: i64):
%scf.yield %arg2 : i32
}
scf.yield %res : i64
} else {
scf.yield %pre_val : i64
}
```
The test pass for `wrapWhileLoopInZeroTripCheck` has been modified to
use the new pattern when `forceCreateCheck=false`.
---------
Signed-off-by: Victor Perez <victor.perez@codeplay.com>
If the `scf.index_switch` op has no result, the current fold logic
results in an infinite loop (see #98535). The is because `fold`
mechanism does not support *erasing* zero-result ops. This PR moves the
fold logic to a canonicalizer and fix the issue.
This reverts commit edbc0e30a9.
Reason for rollback. ASAN complains about this PR:
==4320==ERROR: AddressSanitizer: heap-use-after-free on address 0x502000006cd8 at pc 0x55e2978d63cf bp 0x7ffe6431c2b0 sp 0x7ffe6431c2a8
READ of size 8 at 0x502000006cd8 thread T0
#0 0x55e2978d63ce in map<llvm::MutableArrayRef<mlir::BlockArgument> &, llvm::MutableArrayRef<mlir::BlockArgument>, nullptr> mlir/include/mlir/IR/IRMapping.h:40:11
#1 0x55e2978d63ce in mlir::createFused(mlir::LoopLikeOpInterface, mlir::LoopLikeOpInterface, mlir::RewriterBase&, std::__u::function<llvm::SmallVector<mlir::Value, 6u> (mlir::OpBuilder&, mlir::Location, llvm::ArrayRef<mlir::BlockArgument>)>, llvm::function_ref<void (mlir::RewriterBase&, mlir::LoopLikeOpInterface, mlir::LoopLikeOpInterface&, mlir::IRMapping)>) mlir/lib/Interfaces/LoopLikeInterface.cpp:156:11
#2 0x55e2952a614b in mlir::fuseIndependentSiblingForLoops(mlir::scf::ForOp, mlir::scf::ForOp, mlir::RewriterBase&) mlir/lib/Dialect/SCF/Utils/Utils.cpp:1398:43
#3 0x55e291480c6f in mlir::transform::LoopFuseSiblingOp::apply(mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) mlir/lib/Dialect/SCF/TransformOps/SCFTransformOps.cpp:482:17
#4 0x55e29149ed5e in mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Model<mlir::transform::LoopFuseSiblingOp>::apply(mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Concept const*, mlir::Operation*, mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.h.inc:477:56
#5 0x55e297494a60 in apply blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.cpp.inc:61:14
#6 0x55e297494a60 in mlir::transform::TransformState::applyTransform(mlir::transform::TransformOpInterface) mlir/lib/Dialect/Transform/Interfaces/TransformInterfaces.cpp:953:48
#7 0x55e294646a8d in applySequenceBlock(mlir::Block&, mlir::transform::FailurePropagationMode, mlir::transform::TransformState&, mlir::transform::TransformResults&) mlir/lib/Dialect/Transform/IR/TransformOps.cpp:1788:15
#8 0x55e29464f927 in mlir::transform::NamedSequenceOp::apply(mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) mlir/lib/Dialect/Transform/IR/TransformOps.cpp:2155:10
#9 0x55e2945d28ee in mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Model<mlir::transform::NamedSequenceOp>::apply(mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Concept const*, mlir::Operation*, mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.h.inc:477:56
#10 0x55e297494a60 in apply blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.cpp.inc:61:14
#11 0x55e297494a60 in mlir::transform::TransformState::applyTransform(mlir::transform::TransformOpInterface) mlir/lib/Dialect/Transform/Interfaces/TransformInterfaces.cpp:953:48
#12 0x55e2974a5fe2 in mlir::transform::applyTransforms(mlir::Operation*, mlir::transform::TransformOpInterface, mlir::RaggedArray<llvm::PointerUnion<mlir::Operation*, mlir::Attribute, mlir::Value>> const&, mlir::transform::TransformOptions const&, bool) mlir/lib/Dialect/Transform/Interfaces/TransformInterfaces.cpp:2016:16
#13 0x55e2945888d7 in mlir::transform::applyTransformNamedSequence(mlir::RaggedArray<llvm::PointerUnion<mlir::Operation*, mlir::Attribute, mlir::Value>>, mlir::transform::TransformOpInterface, mlir::ModuleOp, mlir::transform::TransformOptions const&) mlir/lib/Dialect/Transform/Transforms/TransformInterpreterUtils.cpp:234:10
#14 0x55e294582446 in (anonymous namespace)::InterpreterPass::runOnOperation() mlir/lib/Dialect/Transform/Transforms/InterpreterPass.cpp:147:16
#15 0x55e2978e93c6 in operator() mlir/lib/Pass/Pass.cpp:527:17
#16 0x55e2978e93c6 in void llvm::function_ref<void ()>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_1>(long) llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
#17 0x55e2978e207a in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
#18 0x55e2978e207a in executeAction<mlir::PassExecutionAction, mlir::Pass &> mlir/include/mlir/IR/MLIRContext.h:275:7
#19 0x55e2978e207a in mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) mlir/lib/Pass/Pass.cpp:521:21
#20 0x55e2978e5fbf in runPipeline mlir/lib/Pass/Pass.cpp:593:16
#21 0x55e2978e5fbf in mlir::PassManager::runPasses(mlir::Operation*, mlir::AnalysisManager) mlir/lib/Pass/Pass.cpp:904:10
#22 0x55e2978e5b65 in mlir::PassManager::run(mlir::Operation*) mlir/lib/Pass/Pass.cpp:884:60
#23 0x55e291ebb460 in performActions(llvm::raw_ostream&, std::__u::shared_ptr<llvm::SourceMgr> const&, mlir::MLIRContext*, mlir::MlirOptMainConfig const&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:408:17
#24 0x55e291ebabd9 in processBuffer mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:481:9
#25 0x55e291ebabd9 in operator() mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:548:12
#26 0x55e291ebabd9 in llvm::LogicalResult llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>::callback_fn<mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&)::$_0>(long, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&) llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
#27 0x55e297b1cffe in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
#28 0x55e297b1cffe in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef)::$_0::operator()(llvm::StringRef) const mlir/lib/Support/ToolUtilities.cpp:86:16
#29 0x55e297b1c9c5 in interleave<const llvm::StringRef *, (lambda at mlir/lib/Support/ToolUtilities.cpp:79:23), (lambda at llvm/include/llvm/ADT/STLExtras.h:2147:49), void> llvm/include/llvm/ADT/STLExtras.h:2125:3
#30 0x55e297b1c9c5 in interleave<llvm::SmallVector<llvm::StringRef, 8U>, (lambda at mlir/lib/Support/ToolUtilities.cpp:79:23), llvm::raw_ostream, llvm::StringRef> llvm/include/llvm/ADT/STLExtras.h:2147:3
#31 0x55e297b1c9c5 in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef) mlir/lib/Support/ToolUtilities.cpp:89:3
#32 0x55e291eb0cf0 in mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:551:10
#33 0x55e291eb115c in mlir::MlirOptMain(int, char**, llvm::StringRef, llvm::StringRef, mlir::DialectRegistry&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:589:14
#34 0x55e291eb15f8 in mlir::MlirOptMain(int, char**, llvm::StringRef, mlir::DialectRegistry&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:605:10
#35 0x55e29130d1be in main mlir/tools/mlir-opt/mlir-opt.cpp:311:33
#36 0x7fbcf3fff3d3 in __libc_start_main (/usr/grte/v5/lib64/libc.so.6+0x613d3) (BuildId: 9a996398ce14a94560b0c642eb4f6e94)
#37 0x55e2912365a9 in _start /usr/grte/v5/debug-src/src/csu/../sysdeps/x86_64/start.S:120
0x502000006cd8 is located 8 bytes inside of 16-byte region [0x502000006cd0,0x502000006ce0)
freed by thread T0 here:
#0 0x55e29130b7e2 in operator delete(void*, unsigned long) compiler-rt/lib/asan/asan_new_delete.cpp:155:3
#1 0x55e2979eb657 in __libcpp_operator_delete<void *, unsigned long>
#2 0x55e2979eb657 in __do_deallocate_handle_size<>
#3 0x55e2979eb657 in __libcpp_deallocate
#4 0x55e2979eb657 in deallocate
#5 0x55e2979eb657 in deallocate
#6 0x55e2979eb657 in operator()
#7 0x55e2979eb657 in ~vector
#8 0x55e2979eb657 in mlir::Block::~Block() mlir/lib/IR/Block.cpp:24:1
#9 0x55e2979ebc17 in deleteNode llvm/include/llvm/ADT/ilist.h:42:39
#10 0x55e2979ebc17 in erase llvm/include/llvm/ADT/ilist.h:205:5
#11 0x55e2979ebc17 in erase llvm/include/llvm/ADT/ilist.h:209:39
#12 0x55e2979ebc17 in mlir::Block::erase() mlir/lib/IR/Block.cpp:67:28
#13 0x55e297aef978 in mlir::RewriterBase::eraseBlock(mlir::Block*) mlir/lib/IR/PatternMatch.cpp:245:10
#14 0x55e297af0563 in mlir::RewriterBase::inlineBlockBefore(mlir::Block*, mlir::Block*, llvm::ilist_iterator<llvm::ilist_detail::node_options<mlir::Operation, false, false, void, false, void>, false, false>, mlir::ValueRange) mlir/lib/IR/PatternMatch.cpp:331:3
#15 0x55e297af06d8 in mlir::RewriterBase::mergeBlocks(mlir::Block*, mlir::Block*, mlir::ValueRange) mlir/lib/IR/PatternMatch.cpp:341:3
#16 0x55e297036608 in mlir::scf::ForOp::replaceWithAdditionalYields(mlir::RewriterBase&, mlir::ValueRange, bool, std::__u::function<llvm::SmallVector<mlir::Value, 6u> (mlir::OpBuilder&, mlir::Location, llvm::ArrayRef<mlir::BlockArgument>)> const&) mlir/lib/Dialect/SCF/IR/SCF.cpp:575:12
#17 0x55e2970673ca in mlir::detail::LoopLikeOpInterfaceInterfaceTraits::Model<mlir::scf::ForOp>::replaceWithAdditionalYields(mlir::detail::LoopLikeOpInterfaceInterfaceTraits::Concept const*, mlir::Operation*, mlir::RewriterBase&, mlir::ValueRange, bool, std::__u::function<llvm::SmallVector<mlir::Value, 6u> (mlir::OpBuilder&, mlir::Location, llvm::ArrayRef<mlir::BlockArgument>)> const&) blaze-out/k8-opt-asan/bin/mlir/include/mlir/Interfaces/LoopLikeInterface.h.inc:658:56
#18 0x55e2978d5feb in replaceWithAdditionalYields blaze-out/k8-opt-asan/bin/mlir/include/mlir/Interfaces/LoopLikeInterface.cpp.inc:105:14
#19 0x55e2978d5feb in mlir::createFused(mlir::LoopLikeOpInterface, mlir::LoopLikeOpInterface, mlir::RewriterBase&, std::__u::function<llvm::SmallVector<mlir::Value, 6u> (mlir::OpBuilder&, mlir::Location, llvm::ArrayRef<mlir::BlockArgument>)>, llvm::function_ref<void (mlir::RewriterBase&, mlir::LoopLikeOpInterface, mlir::LoopLikeOpInterface&, mlir::IRMapping)>) mlir/lib/Interfaces/LoopLikeInterface.cpp:135:14
#20 0x55e2952a614b in mlir::fuseIndependentSiblingForLoops(mlir::scf::ForOp, mlir::scf::ForOp, mlir::RewriterBase&) mlir/lib/Dialect/SCF/Utils/Utils.cpp:1398:43
#21 0x55e291480c6f in mlir::transform::LoopFuseSiblingOp::apply(mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) mlir/lib/Dialect/SCF/TransformOps/SCFTransformOps.cpp:482:17
#22 0x55e29149ed5e in mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Model<mlir::transform::LoopFuseSiblingOp>::apply(mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Concept const*, mlir::Operation*, mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.h.inc:477:56
#23 0x55e297494a60 in apply blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.cpp.inc:61:14
#24 0x55e297494a60 in mlir::transform::TransformState::applyTransform(mlir::transform::TransformOpInterface) mlir/lib/Dialect/Transform/Interfaces/TransformInterfaces.cpp:953:48
#25 0x55e294646a8d in applySequenceBlock(mlir::Block&, mlir::transform::FailurePropagationMode, mlir::transform::TransformState&, mlir::transform::TransformResults&) mlir/lib/Dialect/Transform/IR/TransformOps.cpp:1788:15
#26 0x55e29464f927 in mlir::transform::NamedSequenceOp::apply(mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) mlir/lib/Dialect/Transform/IR/TransformOps.cpp:2155:10
#27 0x55e2945d28ee in mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Model<mlir::transform::NamedSequenceOp>::apply(mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Concept const*, mlir::Operation*, mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.h.inc:477:56
#28 0x55e297494a60 in apply blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.cpp.inc:61:14
#29 0x55e297494a60 in mlir::transform::TransformState::applyTransform(mlir::transform::TransformOpInterface) mlir/lib/Dialect/Transform/Interfaces/TransformInterfaces.cpp:953:48
#30 0x55e2974a5fe2 in mlir::transform::applyTransforms(mlir::Operation*, mlir::transform::TransformOpInterface, mlir::RaggedArray<llvm::PointerUnion<mlir::Operation*, mlir::Attribute, mlir::Value>> const&, mlir::transform::TransformOptions const&, bool) mlir/lib/Dialect/Transform/Interfaces/TransformInterfaces.cpp:2016:16
#31 0x55e2945888d7 in mlir::transform::applyTransformNamedSequence(mlir::RaggedArray<llvm::PointerUnion<mlir::Operation*, mlir::Attribute, mlir::Value>>, mlir::transform::TransformOpInterface, mlir::ModuleOp, mlir::transform::TransformOptions const&) mlir/lib/Dialect/Transform/Transforms/TransformInterpreterUtils.cpp:234:10
#32 0x55e294582446 in (anonymous namespace)::InterpreterPass::runOnOperation() mlir/lib/Dialect/Transform/Transforms/InterpreterPass.cpp:147:16
#33 0x55e2978e93c6 in operator() mlir/lib/Pass/Pass.cpp:527:17
#34 0x55e2978e93c6 in void llvm::function_ref<void ()>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_1>(long) llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
#35 0x55e2978e207a in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
#36 0x55e2978e207a in executeAction<mlir::PassExecutionAction, mlir::Pass &> mlir/include/mlir/IR/MLIRContext.h:275:7
#37 0x55e2978e207a in mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) mlir/lib/Pass/Pass.cpp:521:21
#38 0x55e2978e5fbf in runPipeline mlir/lib/Pass/Pass.cpp:593:16
#39 0x55e2978e5fbf in mlir::PassManager::runPasses(mlir::Operation*, mlir::AnalysisManager) mlir/lib/Pass/Pass.cpp:904:10
#40 0x55e2978e5b65 in mlir::PassManager::run(mlir::Operation*) mlir/lib/Pass/Pass.cpp:884:60
#41 0x55e291ebb460 in performActions(llvm::raw_ostream&, std::__u::shared_ptr<llvm::SourceMgr> const&, mlir::MLIRContext*, mlir::MlirOptMainConfig const&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:408:17
#42 0x55e291ebabd9 in processBuffer mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:481:9
#43 0x55e291ebabd9 in operator() mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:548:12
#44 0x55e291ebabd9 in llvm::LogicalResult llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>::callback_fn<mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&)::$_0>(long, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&) llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
#45 0x55e297b1cffe in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
#46 0x55e297b1cffe in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef)::$_0::operator()(llvm::StringRef) const mlir/lib/Support/ToolUtilities.cpp:86:16
#47 0x55e297b1c9c5 in interleave<const llvm::StringRef *, (lambda at mlir/lib/Support/ToolUtilities.cpp:79:23), (lambda at llvm/include/llvm/ADT/STLExtras.h:2147:49), void> llvm/include/llvm/ADT/STLExtras.h:2125:3
#48 0x55e297b1c9c5 in interleave<llvm::SmallVector<llvm::StringRef, 8U>, (lambda at mlir/lib/Support/ToolUtilities.cpp:79:23), llvm::raw_ostream, llvm::StringRef> llvm/include/llvm/ADT/STLExtras.h:2147:3
#49 0x55e297b1c9c5 in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef) mlir/lib/Support/ToolUtilities.cpp:89:3
#50 0x55e291eb0cf0 in mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:551:10
#51 0x55e291eb115c in mlir::MlirOptMain(int, char**, llvm::StringRef, llvm::StringRef, mlir::DialectRegistry&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:589:14
previously allocated by thread T0 here:
#0 0x55e29130ab5d in operator new(unsigned long) compiler-rt/lib/asan/asan_new_delete.cpp:86:3
#1 0x55e2979ed5d4 in __libcpp_operator_new<unsigned long>
#2 0x55e2979ed5d4 in __libcpp_allocate
#3 0x55e2979ed5d4 in allocate
#4 0x55e2979ed5d4 in __allocate_at_least<std::__u::allocator<mlir::BlockArgument> >
#5 0x55e2979ed5d4 in __split_buffer
#6 0x55e2979ed5d4 in mlir::BlockArgument* std::__u::vector<mlir::BlockArgument, std::__u::allocator<mlir::BlockArgument>>::__push_back_slow_path<mlir::BlockArgument const&>(mlir::BlockArgument const&)
#7 0x55e2979ec0f2 in push_back
#8 0x55e2979ec0f2 in mlir::Block::addArgument(mlir::Type, mlir::Location) mlir/lib/IR/Block.cpp:154:13
#9 0x55e29796e457 in parseRegionBody mlir/lib/AsmParser/Parser.cpp:2172:34
#10 0x55e29796e457 in (anonymous namespace)::OperationParser::parseRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) mlir/lib/AsmParser/Parser.cpp:2121:7
#11 0x55e29796b25e in (anonymous namespace)::CustomOpAsmParser::parseRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) mlir/lib/AsmParser/Parser.cpp:1785:16
#12 0x55e297035742 in mlir::scf::ForOp::parse(mlir::OpAsmParser&, mlir::OperationState&) mlir/lib/Dialect/SCF/IR/SCF.cpp:521:14
#13 0x55e291322c18 in llvm::ParseResult llvm::detail::UniqueFunctionBase<llvm::ParseResult, mlir::OpAsmParser&, mlir::OperationState&>::CallImpl<llvm::ParseResult (*)(mlir::OpAsmParser&, mlir::OperationState&)>(void*, mlir::OpAsmParser&, mlir::OperationState&) llvm/include/llvm/ADT/FunctionExtras.h:220:12
#14 0x55e29795bea3 in operator() llvm/include/llvm/ADT/FunctionExtras.h:384:12
#15 0x55e29795bea3 in callback_fn<llvm::unique_function<llvm::ParseResult (mlir::OpAsmParser &, mlir::OperationState &)> > llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
#16 0x55e29795bea3 in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
#17 0x55e29795bea3 in parseOperation mlir/lib/AsmParser/Parser.cpp:1521:9
#18 0x55e29795bea3 in parseCustomOperation mlir/lib/AsmParser/Parser.cpp:2017:19
#19 0x55e29795bea3 in (anonymous namespace)::OperationParser::parseOperation() mlir/lib/AsmParser/Parser.cpp:1174:10
#20 0x55e297971d20 in parseBlockBody mlir/lib/AsmParser/Parser.cpp:2296:9
#21 0x55e297971d20 in (anonymous namespace)::OperationParser::parseBlock(mlir::Block*&) mlir/lib/AsmParser/Parser.cpp:2226:12
#22 0x55e29796e4f5 in parseRegionBody mlir/lib/AsmParser/Parser.cpp:2184:7
#23 0x55e29796e4f5 in (anonymous namespace)::OperationParser::parseRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) mlir/lib/AsmParser/Parser.cpp:2121:7
#24 0x55e29796b25e in (anonymous namespace)::CustomOpAsmParser::parseRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) mlir/lib/AsmParser/Parser.cpp:1785:16
#25 0x55e29796b2cf in (anonymous namespace)::CustomOpAsmParser::parseOptionalRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) mlir/lib/AsmParser/Parser.cpp:1796:12
#26 0x55e2978d89ff in mlir::function_interface_impl::parseFunctionOp(mlir::OpAsmParser&, mlir::OperationState&, bool, mlir::StringAttr, llvm::function_ref<mlir::Type (mlir::Builder&, llvm::ArrayRef<mlir::Type>, llvm::ArrayRef<mlir::Type>, mlir::function_interface_impl::VariadicFlag, std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char>>&)>, mlir::StringAttr, mlir::StringAttr) mlir/lib/Interfaces/FunctionImplementation.cpp:232:14
#27 0x55e2969ba41d in mlir::func::FuncOp::parse(mlir::OpAsmParser&, mlir::OperationState&) mlir/lib/Dialect/Func/IR/FuncOps.cpp:203:10
#28 0x55e291322c18 in llvm::ParseResult llvm::detail::UniqueFunctionBase<llvm::ParseResult, mlir::OpAsmParser&, mlir::OperationState&>::CallImpl<llvm::ParseResult (*)(mlir::OpAsmParser&, mlir::OperationState&)>(void*, mlir::OpAsmParser&, mlir::OperationState&) llvm/include/llvm/ADT/FunctionExtras.h:220:12
#29 0x55e29795bea3 in operator() llvm/include/llvm/ADT/FunctionExtras.h:384:12
#30 0x55e29795bea3 in callback_fn<llvm::unique_function<llvm::ParseResult (mlir::OpAsmParser &, mlir::OperationState &)> > llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
#31 0x55e29795bea3 in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
#32 0x55e29795bea3 in parseOperation mlir/lib/AsmParser/Parser.cpp:1521:9
#33 0x55e29795bea3 in parseCustomOperation mlir/lib/AsmParser/Parser.cpp:2017:19
#34 0x55e29795bea3 in (anonymous namespace)::OperationParser::parseOperation() mlir/lib/AsmParser/Parser.cpp:1174:10
#35 0x55e297959b78 in parse mlir/lib/AsmParser/Parser.cpp:2725:20
#36 0x55e297959b78 in mlir::parseAsmSourceFile(llvm::SourceMgr const&, mlir::Block*, mlir::ParserConfig const&, mlir::AsmParserState*, mlir::AsmParserCodeCompleteContext*) mlir/lib/AsmParser/Parser.cpp:2785:41
#37 0x55e29790d5c2 in mlir::parseSourceFile(std::__u::shared_ptr<llvm::SourceMgr> const&, mlir::Block*, mlir::ParserConfig const&, mlir::LocationAttr*) mlir/lib/Parser/Parser.cpp:46:10
#38 0x55e291ebbfe2 in parseSourceFile<mlir::ModuleOp, const std::__u::shared_ptr<llvm::SourceMgr> &> mlir/include/mlir/Parser/Parser.h:159:14
#39 0x55e291ebbfe2 in parseSourceFile<mlir::ModuleOp> mlir/include/mlir/Parser/Parser.h:189:10
#40 0x55e291ebbfe2 in mlir::parseSourceFileForTool(std::__u::shared_ptr<llvm::SourceMgr> const&, mlir::ParserConfig const&, bool) mlir/include/mlir/Tools/ParseUtilities.h:31:12
#41 0x55e291ebb263 in performActions(llvm::raw_ostream&, std::__u::shared_ptr<llvm::SourceMgr> const&, mlir::MLIRContext*, mlir::MlirOptMainConfig const&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:383:33
#42 0x55e291ebabd9 in processBuffer mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:481:9
#43 0x55e291ebabd9 in operator() mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:548:12
#44 0x55e291ebabd9 in llvm::LogicalResult llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>::callback_fn<mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&)::$_0>(long, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&) llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
#45 0x55e297b1cffe in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
#46 0x55e297b1cffe in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef)::$_0::operator()(llvm::StringRef) const mlir/lib/Support/ToolUtilities.cpp:86:16
#47 0x55e297b1c9c5 in interleave<const llvm::StringRef *, (lambda at mlir/lib/Support/ToolUtilities.cpp:79:23), (lambda at llvm/include/llvm/ADT/STLExtras.h:2147:49), void> llvm/include/llvm/ADT/STLExtras.h:2125:3
#48 0x55e297b1c9c5 in interleave<llvm::SmallVector<llvm::StringRef, 8U>, (lambda at mlir/lib/Support/ToolUtilities.cpp:79:23), llvm::raw_ostream, llvm::StringRef> llvm/include/llvm/ADT/STLExtras.h:2147:3
#49 0x55e297b1c9c5 in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef) mlir/lib/Support/ToolUtilities.cpp:89:3
#50 0x55e291eb0cf0 in mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:551:10
#51 0x55e291eb115c in mlir::MlirOptMain(int, char**, llvm::StringRef, llvm::StringRef, mlir::DialectRegistry&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:589:14
#52 0x55e291eb15f8 in mlir::MlirOptMain(int, char**, llvm::StringRef, mlir::DialectRegistry&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:605:10
#53 0x55e29130d1be in main mlir/tools/mlir-opt/mlir-opt.cpp:311:33
#54 0x7fbcf3fff3d3 in __libc_start_main (/usr/grte/v5/lib64/libc.so.6+0x613d3) (BuildId: 9a996398ce14a94560b0c642eb4f6e94)
#55 0x55e2912365a9 in _start /usr/grte/v5/debug-src/src/csu/../sysdeps/x86_64/start.S:120
SUMMARY: AddressSanitizer: heap-use-after-free mlir/include/mlir/IR/IRMapping.h:40:11 in map<llvm::MutableArrayRef<mlir::BlockArgument> &, llvm::MutableArrayRef<mlir::BlockArgument>, nullptr>
Shadow bytes around the buggy address:
0x502000006a00: fa fa 00 fa fa fa 00 00 fa fa 00 fa fa fa 00 fa
0x502000006a80: fa fa 00 fa fa fa 00 00 fa fa 00 00 fa fa 00 00
0x502000006b00: fa fa 00 00 fa fa 00 00 fa fa 00 fa fa fa 00 fa
0x502000006b80: fa fa 00 fa fa fa 00 fa fa fa 00 00 fa fa 00 00
0x502000006c00: fa fa 00 00 fa fa 00 00 fa fa 00 00 fa fa fd fa
=>0x502000006c80: fa fa fd fa fa fa fd fd fa fa fd[fd]fa fa fd fd
0x502000006d00: fa fa 00 fa fa fa 00 fa fa fa 00 fa fa fa 00 fa
0x502000006d80: fa fa 00 fa fa fa 00 fa fa fa 00 fa fa fa 00 fa
0x502000006e00: fa fa 00 fa fa fa 00 fa fa fa 00 00 fa fa 00 fa
0x502000006e80: fa fa 00 fa fa fa 00 00 fa fa 00 fa fa fa 00 fa
0x502000006f00: fa fa 00 fa fa fa 00 fa fa fa 00 fa fa fa 00 fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==4320==ABORTING
The refactor had a bug where the fused loop was inserted in an incorrect
location. This patch fixes the bug and relands the original PR
https://github.com/llvm/llvm-project/pull/94391.
This patch refactors code related to LoopFuseSiblingOp transform in
attempt to reduce duplicate common code. The aim is to refactor as much
as possible to a functions on LoopLikeOpInterfaces, but this is still a
work in progress. A full refactor will require more additions to the
LoopLikeOpInterface.
In addition, scf.parallel fusion support has been added.
This patch refactors code related to `LoopFuseSiblingOp` transform in
attempt to reduce duplicate common code. The aim is to refactor as much
as possible to a functions on `LoopLikeOpInterface`s, but this is still
a work in progress. A full refactor will require more additions to the
`LoopLikeOpInterface`.
In addition, `scf.parallel` fusion support has been added.
Unroll And Jam was supported in affine dialect long time ago using pass.
This commit exposes the pattern using transform and in addition adds
partial support for SCF loops.
Remove a TODO in the dialect conversion code base when materializing
unresolved conversions:
```
// FIXME: Determine a suitable insertion location when there are multiple
// inputs.
```
The implementation used to select an insertion point as follows:
- If the cast has exactly one operand: right after the definition of the
SSA value.
- Otherwise: right before the cast op.
However, it is not necessary to change the insertion point. Unresolved
materializations (`UnrealizedConversionCastOp`) are built during
`buildUnresolvedArgumentMaterialization` or
`buildUnresolvedTargetMaterialization`. In the former case, the op is
inserted at the beginning of the block. In the latter case, only one
operand is supported in the dialect conversion, and the op is inserted
right after the definition of the SSA value. I.e., the
`UnrealizedConversionCastOp` is already inserted at the right place and
it is not necessary to change the insertion point for the resolved
materialization op.
Note: The IR change changes slightly because the
`unrealized_conversion_cast` ops at the beginning of a block are no
longer doubly-inverted (by setting the insertion to the beginning of the
block when inserting the `unrealized_conversion_cast` and again when
inserting the resolved conversion op). All affected test cases were
fixed by using `CHECK-DAG` instead of `CHECK`.
Also improve the quality of multiple test cases that did not check for
the correct operands.
Note: This commit is in preparation of decoupling the
argument/source/target materialization logic of the type converter from
the dialect conversion (to reduce its complexity and make that
functionality usable from a new dialect conversion driver).
Changes transform.foreach's interface to take multiple arguments, e.g.
transform.foreach %ops1, %ops2, %params : ... { ^bb0(%op1, %op2,
%param): BODY } The semantics are that the payloads for these handles
get iterated over as if the payloads have been zipped-up together - BODY
gets executed once for each such tuple. The documentation explains that
this implementation requires that the payloads have the same length.
This change also enables the target argument(s) to be any op/value/param
handle.
The added test cases demonstrate some use cases for this change.
Handling parallel region RaW conflicts should usually be the
responsibility of the source program, rather than bufferization
analysis. However, to preserve current functionality, checks on parallel
regions is put behind a bufferization in this PR, which is on by
default. Default functionality will not change, but this PR enables the
option to leave parallelism checks out of the bufferization analysis.
Currently, during a loop pipelining transformation, operations may be
hoisted out without any checks on the loop bounds, which leads to
incorrect transformations and unexpected behaviour. The following [issue
](https://github.com/llvm/llvm-project/issues/90870) describes the
problem more extensively, including an example.
The proposed fix adds some check in the loop bounds before and applies
the maximum hoisting.
There is currently no path to lower scf.forall to scf.parallel with the
goal of targeting the OpenMP dialect.
In the SCF->ControlFlow conversion, scf.forall is briefly converted to
scf.parallel, but the scf.parallel is lowered directly to a sequential
loop. This makes experimenting with scf.forall for CPU execution
difficult.
This change factors out the rewrite in the SCF->ControlFlow pass into a
utility function that can then be used in the SCF->ControlFlow lowering
and via a separate -scf-forall-to-parallel pass.
---------
Co-authored-by: Spenser Bauman <sabauma@fastmail>
When coalescing is some of the loops are unit-trip we can avoid
generating div/rem instructions during delinearization. Ideally we could
use some thing like `affine.delinearize` to handle this but tthat causes
dependence issues.
-- This commit adds a canonicalization pattern to fold away iter args of
scf.forall if :-
a. The corresponding tied result has no use.
b. It is not being modified within the loop.
Signed-off-by: Abhishek Varma <avarma094@gmail.com>
This PR extracts the existing `scf.forall` to `scf.for` conversion logic
inside a transform op (https://github.com/llvm/llvm-project/pull/65474)
into a standalone function which can be used in other transformations
and adds a `scf-forall-to-for` pass.
Add uplifting from `scf.while` to `scf.for`.
This uplifting expects a very specific ops pattern:
* `before` block consisting of single `arith.cmp` op
* `after` block containing `arith.addi`
We also have a set of patterns to cleanup `scf.while` loops to get them
close to the desired form, they will be added in separate PRs.
This is part of upstreaming `numba-mlir` scf uplifting pipeline: `cf ->
scf.while -> scf.for -> scf.parallel`
Original code:
https://github.com/numba/numba-mlir/blob/main/mlir/lib/Transforms/PromoteToParallel.cpp
`verifySchedule` was not looking at the operands of nested operations,
which caused incorrect schedule to be allowed in some cases, potentially
leading to crash during expansion.
There is also minor fix in `cloneAndUpdateOperands` in the pipeline
expander that prevents double visit of the cloned op. This one has no
functional impact, so no test for it.
This commit adds a new public API to `ValueBoundsOpInterface` to compare
values/dims. Supported comparison operators are: LT, LE, EQ, GE, GT.
The new `ValueBoundsOpInterface::compare` API replaces and generalizes
`ValueBoundsOpInterface::areEqual`. Not only does it provide additional
comparison operators, it also works in cases where the difference
between the two values/dims is non-constant. The previous implementation
of `areEqual` used to compute a constant bound of `val1 - val2` (check
if it `== 0` or `!= 0`).
Note: This commit refactors, generalizes and adds a public API for
value/dim comparison. The comparison functionality itself was introduced
in #85895 and is already in use for analyzing `scf.if`.
In the long term, this improvement will allow for a more powerful
analysis of subset ops. A future commit will update
`areOverlappingSlices` to use the new comparison API.
(`areEquivalentSlices` is already using the new API.) This will improve
subset equivalence/disjointness checks with non-constant
offsets/sizes/strides.
This commit adds support for `scf.if` to `ValueBoundsConstraintSet`.
Example:
```
%0 = scf.if ... -> index {
scf.yield %a : index
} else {
scf.yield %b : index
}
```
The following constraints hold for %0:
* %0 >= min(%a, %b)
* %0 <= max(%a, %b)
Such constraints cannot be added to the constraint set; min/max is not
supported by `IntegerRelation`. However, if we know which one of %a and
%b is larger, we can add constraints for %0. E.g., if %a <= %b:
* %0 >= %a
* %0 <= %b
This commit required a few minor changes to the
`ValueBoundsConstraintSet` infrastructure, so that values can be
compared while we are still in the process of traversing the IR/adding
constraints.
Note: This is a re-upload of #85895, which was reverted. The bug that
caused the failure was fixed in #87859.
This commit adds support for `scf.if` to `ValueBoundsConstraintSet`.
Example:
```
%0 = scf.if ... -> index {
scf.yield %a : index
} else {
scf.yield %b : index
}
```
The following constraints hold for %0:
* %0 >= min(%a, %b)
* %0 <= max(%a, %b)
Such constraints cannot be added to the constraint set; min/max is not
supported by `IntegerRelation`. However, if we know which one of %a and
%b is larger, we can add constraints for %0. E.g., if %a <= %b:
* %0 >= %a
* %0 <= %b
This commit required a few minor changes to the
`ValueBoundsConstraintSet` infrastructure, so that values can be
compared while we are still in the process of traversing the IR/adding
constraints.
As part of this extension this change also does some general cleanup
1) Make all the methods take `RewriterBase` as arguments instead of
creating their own builders that tend to crash when used within
pattern rewrites
2) Split `coalesePerfectlyNestedLoops` into two separate methods, one
for `scf.for` and other for `affine.for`. The templatization didnt
seem to be buying much there.
Also general clean up of tests.
If `before` block args are directly forwarded to `scf.condition` make
sure they are passed in the same order.
This is needed for `scf.while` uplifting
https://github.com/llvm/llvm-project/pull/76108
Adds support for fusing two scf.for loops occurring in the same block.
Uses the rudimentary checks already in place for scf.forall (like the
target loop's operands being dominated by the source loop).
- Fixes a bug in the dominance check whereby it was checked that values
in the target loop themselves dominated the source loop rather than the
ops that define these operands.
- Renames the LoopFuseSibling op to LoopFuseSiblingOp.
- Updates LoopFuseSiblingOp's description.
- Adds tests for using LoopFuseSiblingOp on scf.for loops, including one
which fails without the fix for the dominance check.
- Adds tests checking the different failure modes of the dominance
checker.
- Adds test for case whereby scf.yield is automatically generated when
there are no loop-carried variables.
This PR fixes the condition used in loop peeling of the first iteration.
Using ceilDiv instead of floorDiv when calculating the loop counts, so
that the first iteration gets peeled as needed.
One-Shot Bufferize currently does not support loops where a yielded
value bufferizes to a buffer that is different from the buffer of the
region iter_arg. In such a case, the bufferization fails with an error
such as:
```
Yield operand #0 is not equivalent to the corresponding iter bbArg
scf.yield %0 : tensor<5xf32>
```
One common reason for non-equivalent buffers is that an op on the path
from the region iter_arg to the terminator bufferizes out-of-place. Ops
that are analyzed earlier are more likely to bufferize in-place.
This commit adds a new heuristic that gives preference to ops that are
reachable on the reverse SSA use-def chain from a region terminator and
are within the parent region of the terminator. This is expected to work
better than the existing heuristics for loops where an iter_arg is
written to multiple times within a loop, but only one write is fed into
the terminator.
Current users of One-Shot Bufferize are not affected by this change.
"Bottom-up" is still the default heuristic. Users can switch to the new
heuristic manually.
This commit also turns the "fuzzer" pass option into a heuristic,
cleaning up the code a bit.
Properly handle fusion of loops with reductions:
* Check there are no first loop results users between loops
* Create new loop op with merged reduction init values
* Update `scf.reduce` op to contain reductions from both loops
* Update loops users with new loop results
When checking the load indices of the second loop coincide with the
store indices of the first loop, it only considers the index values are
the same or not. However, there are some cases the index values defined
by other operators. In these cases, it will treat them as different even
the results of defining operators are the same.
We already check if the iteration space is the same in isFusionLegal().
When checking operands of defining operators, we only need to consider
the operands come from the same induction variables. If so, we know the
results of defining operators are the same.
Enable the fusion of parallel loops also when the 1st loop contains
multiple write accesses to the same buffer, if the accesses are always
on the same indices.
Fix LIT test cases whose loops were not being fused.
Signed-off-by: Fabrizio Indirli <Fabrizio.Indirli@arm.com>
An op verifier should verify only local properties. This commit removes
the verification of `scf.for` step sizes. (Verifiers can check
attributes but should not follow SSA values.) This verification could
reject IR that is actually valid, e.g.:
```mlir
scf.if %always_false {
// Branch is never entered.
scf.for ... step %c0 { ... }
}
```
This commit fixes `for-loop-peeling.mlir` when running with
`MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS`:
```
within split at llvm-project/mlir/test/Dialect/SCF/for-loop-peeling.mlir:293 offset :9:3: note: see current operation:
"scf.for"(%0, %3, %2) ({
^bb0(%arg1: index):
%4 = "arith.index_cast"(%arg1) : (index) -> i64
"memref.store"(%4, %arg0) : (i64, memref<i64>) -> ()
"scf.yield"() : () -> ()
}) {__peeled_loop__} : (index, index, index) -> ()
LLVM ERROR: IR failed to verify after folding
```
Note: `%2` is `arith.constant 0 : index`.
Before applying the peeling patterns, it can happen that the `ForOp`
gets a step of zero during folding. This leads to a division-by-zero
down the line.
This patch adds an additional check for a constant-zero step and a
test.
Fix https://github.com/llvm/llvm-project/issues/75758
Introduce a new extension for simple print-debugging of the transform
dialect scripts. The initial version of this extension consists of two
ops that are printing the payload objects associated with transform
dialect values. Similar ops were already available in the test extenion
and several downstream projects, and were extensively used for testing.
Add a check to validate that the schedule passed to the pipeliner
transformation is valid and won't cause the pipeliner to break SSA.
This checks that the for each operation in the loop operations are
scheduled after their operands.
The `GreedyPatternRewriteDriver` tries to iteratively fold ops and apply
rewrite patterns to ops. It has special handling for constants: they are
CSE'd and sometimes moved to parent regions to allow for additional
CSE'ing. This happens in `OperationFolder`.
To allow for efficient CSE'ing, `OperationFolder` maintains an internal
lookup data structure to find the existing constant ops with the same
value for each `IsolatedFromAbove` region:
```c++
/// A mapping between an insertion region and the constants that have been
/// created within it.
DenseMap<Region *, ConstantMap> foldScopes;
```
Rewrite patterns are allowed to modify operations. In particular, they
may move operations (including constants) from one region to another
one. Such an IR rewrite can make the above lookup data structure
inconsistent.
We encountered such a bug in a downstream project. This bug materialized
in the form of an op that uses the result of a constant op from a
different `IsolatedFromAbove` region (that is not accessible).
This commit changes the behavior of the `GreedyPatternRewriteDriver`
such that `OperationFolder` is used to CSE constants at the beginning of
each iteration (as the worklist is populated), but no longer during an
iteration. `OperationFolder` is no longer used after populating the
worklist, so we do not have to care about inconsistent state in the
`OperationFolder` due to IR rewrites. The `GreedyPatternRewriteDriver`
now performs the op folding by itself instead of calling
`OperationFolder::tryToFold`.
This change changes the order of constant ops in test cases, but not the
region in which they appear. All broken test cases were fixed by turning
`CHECK` into `CHECK-DAG`.
Alternatives considered: The state of `OperationFolder` could be
partially invalidated with every `notifyOperationModified` notification.
That is more fragile than the solution in this commit because incorrect
rewriter API usage can lead to missing notifications and hard-to-debug
`IsolatedFromAbove` violations. (It did not fix the above mention bug in
a downstream project, which could be due to incorrect rewriter API usage
or due to another conceptual problem that I missed.) Moreover, ops are
frequently getting modified during a greedy pattern rewrite, so we would
likely keep invalidating large parts of the state of `OperationFolder`
over and over.
Migration guide: Turn `CHECK` into `CHECK-DAG` in test cases. Constant
ops are no longer folded during a greedy pattern rewrite. If you rely on
folding (and rematerialization) of constant ops during a greedy pattern
rewrite, turn the folder into a pattern.