The main goal of the model is to avoid *increasing* function size, as
that would eradicate any memory locality benefits from splitting. This
happens when:
- There are too many inputs or outputs to the cold region. Argument
materialization and reloads of outputs have a cost.
- The cold region has too many distinct exit blocks, causing a large
switch to be formed in the caller.
- The code size cost of the split code is less than the cost of a
set-up call.
A secondary goal is to prevent excessive overall binary size growth.
With the cost model in place, I experimented to find a splitting
threshold that works well in practice. To make warm & cold code easily
separable for analysis purposes, I moved split functions to a "cold"
section. I experimented with thresholds between [0, 4] and set the
default to the threshold which minimized geomean __text size.
Experiment data from building LNT+externals for X86 (N = 639 programs,
all sizes in bytes):
| Configuration | __text geom size | __cold geom size | TEXT geom size |
| **-Os** | 1736.3 | 0, n=0 | 10961.6 |
| -Os, thresh=0 | 1740.53 | 124.482, n=134 | 11014 |
| -Os, thresh=1 | 1734.79 | 57.8781, n=90 | 10978.6 |
| -Os, thresh=2 | ** 1733.85 ** | 65.6604, n=61 | 10977.6 |
| -Os, thresh=3 | 1733.85 | 65.3071, n=61 | 10977.6 |
| -Os, thresh=4 | 1735.08 | 67.5156, n=54 | 10965.7 |
| **-Oz** | 1554.4 | 0, n=0 | 10153 |
| -Oz, thresh=2 | ** 1552.2 ** | 65.633, n=61 | 10176 |
| **-O3** | 2563.37 | 0, n=0 | 13105.4 |
| -O3, thresh=2 | ** 2559.49 ** | 71.1072, n=61 | 13162.4 |
Picking thresh=2 reduces the geomean __text section size by 0.14% at
-Os, -Oz, and -O3 and causes ~0.2% growth in the TEXT segment. Note that
TEXT size is page-aligned, whereas section sizes are byte-aligned.
Experiment data from building LNT+externals for ARM64 (N = 558 programs,
all sizes in bytes):
| Configuration | __text geom size | __cold geom size | TEXT geom size |
| **-Os** | 1763.96 | 0, n=0 | 42934.9 |
| -Os, thresh=2 | ** 1760.9 ** | 76.6755, n=61 | 42934.9 |
Picking thresh=2 reduces the geomean __text section size by 0.17% at
-Os and causes no growth in the TEXT segment.
Measurements were done with D57082 (r352080) applied.
Differential Revision: https://reviews.llvm.org/D57125
llvm-svn: 352228
38 lines
971 B
LLVM
38 lines
971 B
LLVM
; RUN: opt -hotcoldsplit -hotcoldsplit-threshold=-1 -pass-remarks=hotcoldsplit -S < %s 2>&1 | FileCheck %s
|
|
; RUN: opt -passes=hotcoldsplit -hotcoldsplit-threshold=-1 -pass-remarks=hotcoldsplit -S < %s 2>&1 | FileCheck %s
|
|
|
|
; Make sure this compiles. This test used to fail with an invalid phi node: the
|
|
; two predecessors were outlined and the SSA representation was invalid.
|
|
|
|
; CHECK: remark: <unknown>:0:0: fun split cold code into fun.cold.1
|
|
; CHECK-LABEL: @fun
|
|
; CHECK: codeRepl:
|
|
; CHECK-NEXT: call void @fun.cold.1
|
|
|
|
; CHECK: define {{.*}}@fun.cold.1{{.*}} [[cold_attr:#[0-9]+]]
|
|
; CHECK: attributes [[cold_attr]] = { {{.*}}noreturn
|
|
|
|
define void @fun() {
|
|
entry:
|
|
br i1 undef, label %if.then, label %if.else
|
|
|
|
if.then:
|
|
ret void
|
|
|
|
if.else:
|
|
br label %if.then4
|
|
|
|
if.then4:
|
|
br i1 undef, label %if.then5, label %if.end
|
|
|
|
if.then5:
|
|
br label %cleanup
|
|
|
|
if.end:
|
|
br label %cleanup
|
|
|
|
cleanup:
|
|
%cleanup.dest.slot.0 = phi i32 [ 1, %if.then5 ], [ 0, %if.end ]
|
|
unreachable
|
|
}
|