Files
clang-p2996/mlir/test/Transforms/single-parallel-loop-collapsing.mlir
Matthias Springer 10056c821a [mlir][SCF] scf.parallel: Make reductions part of the terminator (#75314)
This commit makes reductions part of the terminator. Instead of
`scf.yield`, `scf.reduce` now terminates the body of `scf.parallel` ops.
`scf.reduce` may contain an arbitrary number of reductions, with one
region per reduction.

Example:
```mlir
%init = arith.constant 0.0 : f32
%r:2 = scf.parallel (%iv) = (%lb) to (%ub) step (%step) init (%init, %init)
    -> f32, f32 {
  %elem_to_reduce1 = load %buffer1[%iv] : memref<100xf32>
  %elem_to_reduce2 = load %buffer2[%iv] : memref<100xf32>
  scf.reduce(%elem_to_reduce1, %elem_to_reduce2 : f32, f32) {
    ^bb0(%lhs : f32, %rhs: f32):
      %res = arith.addf %lhs, %rhs : f32
      scf.reduce.return %res : f32
  }, {
    ^bb0(%lhs : f32, %rhs: f32):
      %res = arith.mulf %lhs, %rhs : f32
      scf.reduce.return %res : f32
  }
}
```

`scf.reduce` operations can no longer be interleaved with other ops in
the body of `scf.parallel`. This simplifies the op and makes it possible
to assign the `RecursiveMemoryEffects` trait to `scf.reduce`. (This was
not possible before because the op was not a terminator, causing the op
to be DCE'd.)
2023-12-20 11:06:27 +09:00

35 lines
1.7 KiB
MLIR

// RUN: mlir-opt -allow-unregistered-dialect %s -pass-pipeline='builtin.module(func.func(test-scf-parallel-loop-collapsing{collapsed-indices-0=0,1}, canonicalize))' | FileCheck %s
func.func @collapse_to_single() {
%c0 = arith.constant 3 : index
%c1 = arith.constant 7 : index
%c2 = arith.constant 11 : index
%c3 = arith.constant 29 : index
%c4 = arith.constant 3 : index
%c5 = arith.constant 4 : index
scf.parallel (%i0, %i1) = (%c0, %c1) to (%c2, %c3) step (%c4, %c5) {
%result = "magic.op"(%i0, %i1): (index, index) -> index
}
return
}
// CHECK-LABEL: func @collapse_to_single() {
// CHECK-DAG: [[C18:%.*]] = arith.constant 18 : index
// CHECK-DAG: [[C6:%.*]] = arith.constant 6 : index
// CHECK-DAG: [[C3:%.*]] = arith.constant 3 : index
// CHECK-DAG: [[C7:%.*]] = arith.constant 7 : index
// CHECK-DAG: [[C4:%.*]] = arith.constant 4 : index
// CHECK-DAG: [[C1:%.*]] = arith.constant 1 : index
// CHECK-DAG: [[C0:%.*]] = arith.constant 0 : index
// CHECK: scf.parallel ([[NEW_I:%.*]]) = ([[C0]]) to ([[C18]]) step ([[C1]]) {
// CHECK: [[I0_COUNT:%.*]] = arith.remsi [[NEW_I]], [[C6]] : index
// CHECK: [[I1_COUNT:%.*]] = arith.divsi [[NEW_I]], [[C6]] : index
// CHECK: [[V0:%.*]] = arith.muli [[I0_COUNT]], [[C4]] : index
// CHECK: [[I1:%.*]] = arith.addi [[V0]], [[C7]] : index
// CHECK: [[V1:%.*]] = arith.muli [[I1_COUNT]], [[C3]] : index
// CHECK: [[I0:%.*]] = arith.addi [[V1]], [[C3]] : index
// CHECK: "magic.op"([[I0]], [[I1]]) : (index, index) -> index
// CHECK: scf.reduce
// CHECK-NEXT: }
// CHECK-NEXT: return