With D107249 I saw huge compile time regressions on a module (150s -> 5700s). This turned out to be due to a huge RefSCC in the module. As we ran the function simplification pipeline on functions in the SCCs in the RefSCC, some of those SCCs would be split out to their RefSCC, a child of the current RefSCC. We'd skip the remaining SCCs in the huge RefSCC because the current RefSCC is now the RefSCC just split out, then revisit the original huge RefSCC from the beginning. This happened many times because many functions in the RefSCC were optimizable to the point of becoming their own RefSCC. This patch makes it so we don't skip SCCs not in the current RefSCC so that we split out all the child RefSCCs on the first iteration of RefSCC. When we split out a RefSCC, we invalidate the original RefSCC and add the remainder of the SCCs into a new RefSCC in RCWorklist. This happens repeatedly until we finish visiting all SCCs, at which point there is only one valid RefSCC in RCWorklist from the original RefSCC containing all the SCCs that were not split out, and we visit that. For example, in the newly added test cgscc-refscc-mutation-order.ll, we'd previously run instcombine in this order: f1, f2, f1, f3, f1, f4, f1 Now it's: f1, f2, f3, f4, f1 This can cause more passes to be run in some specific cases, e.g. if f1<->f2 gets optimized to f1<-f2, we'd previously run f1, f2; now we run f1, f2, f2. This improves kimwitu++ compile times by a lot (12-15% for various -O3 configs): https://llvm-compile-time-tracker.com/compare.php?from=2371c5a0e06d22b48da0427cebaf53a5e5c54635&to=00908f1d67400cab1ad7bcd7cacc7558d1672e97&stat=instructions Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D121953
45 lines
1.9 KiB
LLVM
45 lines
1.9 KiB
LLVM
; When an SCC got split due to inlining, we have two mechanisms for reprocessing the updated SCC, first is UR.UpdatedC
|
|
; that repeatedly rerun the new, current SCC; second is a worklist for all newly split SCCs. We need to avoid rerun of
|
|
; the same SCC when the SCC is set to be processed by both mechanisms back to back. In pathological cases, such extra,
|
|
; redundant rerun could cause exponential size growth due to inlining along cycles.
|
|
;
|
|
; The test cases here illustrates potential redundant rerun and how it's prevented, however there's no extra inlining
|
|
; even if we allow the redundant rerun. In real code, when inliner makes different decisions for different call sites
|
|
; of the same caller-callee edge, we could end up getting more recursive inlining without SCC mutation.
|
|
;
|
|
; REQUIRES: asserts
|
|
; RUN: opt < %s -passes='cgscc(inline)' -inline-threshold=500 -debug-only=cgscc -S 2>&1 | FileCheck %s
|
|
|
|
; CHECK: Running an SCC pass across the RefSCC: [(test1_a, test1_b, test1_c)]
|
|
; CHECK: Enqueuing the existing SCC in the worklist:(test1_b)
|
|
; CHECK: Enqueuing a newly formed SCC:(test1_c)
|
|
; CHECK: Enqueuing a new RefSCC in the update worklist: [(test1_b)]
|
|
; CHECK: Switch an internal ref edge to a call edge from 'test1_a' to 'test1_c'
|
|
; CHECK: Switch an internal ref edge to a call edge from 'test1_a' to 'test1_a'
|
|
; CHECK: Re-running SCC passes after a refinement of the current SCC: (test1_c, test1_a)
|
|
; CHECK: Skipping redundant run on SCC: (test1_c, test1_a)
|
|
|
|
declare void @external(i32 %seed)
|
|
|
|
define void @test1_a(i32 %num) {
|
|
entry:
|
|
call void @test1_b(i32 %num)
|
|
call void @external(i32 %num)
|
|
ret void
|
|
}
|
|
|
|
define void @test1_b(i32 %num) {
|
|
entry:
|
|
call void @test1_c(i32 %num)
|
|
call void @test1_a(i32 %num)
|
|
call void @external(i32 %num)
|
|
ret void
|
|
}
|
|
|
|
define void @test1_c(i32 %num) #0 {
|
|
call void @test1_a(i32 %num)
|
|
ret void
|
|
}
|
|
|
|
attributes #0 = { noinline nounwind optnone }
|