When split functions is used, BOLT may skip tentative code layout
estimation in some cases, like:
- when there is no profile data for some blocks (ie cold blocks)
- when there are cold functions in lite mode
- when skip functions is used
However, when rewriting the binary we still need to compute PC-relative
distances between hot and cold basic blocks. Without cold layout
estimation, BOLT uses '0x0' as the address of the first cold block,
leading to incorrect estimations of any PC-relative addresses.
This affects large binaries as the relaxStub method expands more
branches than necessary using the short-jump sequence, at it wrongly
believes it has exceeded the branch distance boundary.
This increases code size with both a larger and slower sequence;
however,
performance regression is expected to be minimal since this only affects
any called cold code.
Example of such an unnecessary relaxation:
from:
```armasm
b .Ltmp1234
```
to:
```armasm
adrp x16, .Ltmp1234
add x16, x16, :lo12:.Ltmp1234
br x16
```
28 lines
639 B
ArmAsm
28 lines
639 B
ArmAsm
# This test checks that tentative code layout for cold blocks always runs.
|
|
# It commonly happens when using lite mode with split functions.
|
|
|
|
# REQUIRES: system-linux, asserts
|
|
|
|
# RUN: %clang %cflags -o %t %s
|
|
# RUN: %clang %s %cflags -Wl,-q -o %t
|
|
# RUN: link_fdata --no-lbr %s %t %t.fdata
|
|
# RUN: llvm-bolt %t -o %t.bolt --data %t.fdata -split-functions \
|
|
# RUN: -debug 2>&1 | FileCheck %s
|
|
|
|
.text
|
|
.globl foo
|
|
.type foo, %function
|
|
foo:
|
|
.entry_bb:
|
|
# FDATA: 1 foo #.entry_bb# 10
|
|
cmp x0, #0
|
|
b.eq .Lcold_bb1
|
|
ret
|
|
.Lcold_bb1:
|
|
ret
|
|
|
|
## Force relocation mode.
|
|
.reloc 0, R_AARCH64_NONE
|
|
|
|
# CHECK: foo{{.*}} cold tentative: {{.*}}
|