Files
clang-p2996/bolt/test/X86/pre-aggregated-perf.test
Amir Ayupov 03bbd04bb7 [BOLT][NFCI] Skip validation in parseLBRSample (#143288)
Parsed branches and fall-throughs are validated in `doBranch` and
`doTrace` respectively. Simplify parseLBRSample by omitting the
validation. This also speeds up perf data processing as checks are only
done once for aggregated branches/fall-throughs and not individual LBR
entries.

Since invalid/external addresses are no longer sanitized during parsing,
sanitize them in `doBranch`.

Test Plan: updated X86/pre-aggregated-perf.test
2025-06-08 17:50:02 -07:00

104 lines
4.8 KiB
Plaintext

## This script checks that perf2bolt is reading pre-aggregated perf information
## correctly for a simple example. The perf.data of this example was generated
## with the following command:
##
## $ perf record -j any,u -e branch -o perf.data -- ./blarge
##
## blarge is the binary for "basicmath large inputs" taken from Mibench.
## Currently failing in MacOS / generating different hash for usqrt
REQUIRES: system-linux
RUN: yaml2obj %p/Inputs/blarge.yaml &> %t.exe
RUN: perf2bolt %t.exe -o %t --pa -p %p/Inputs/pre-aggregated.txt -w %t.new \
RUN: --show-density --heatmap %t.hm \
RUN: --profile-density-threshold=9 --profile-density-cutoff-hot=970000 \
RUN: --profile-use-dfs | FileCheck %s --check-prefix=CHECK-P2B
RUN: FileCheck --input-file %t.hm-section-hotness.csv --check-prefix=CHECK-HM %s
CHECK-P2B: HEATMAP: building heat map
CHECK-P2B: BOLT-INFO: 4 out of 7 functions in the binary (57.1%) have non-empty execution profile
CHECK-P2B: BOLT-INFO: Functions with density >= 21.7 account for 97.00% total sample counts.
CHECK-HM: .text, 0x400680, 0x401232, 100.0000, 4.2553, 0.0426
RUN: perf2bolt %t.exe -o %t --pa -p %p/Inputs/pre-aggregated.txt -w %t.new \
RUN: --show-density \
RUN: --profile-density-cutoff-hot=970000 \
RUN: --profile-use-dfs 2>&1 | FileCheck %s --check-prefix=CHECK-WARNING
CHECK-WARNING: BOLT-INFO: 4 out of 7 functions in the binary (57.1%) have non-empty execution profile
CHECK-WARNING: BOLT-WARNING: BOLT is estimated to optimize better with 2.8x more samples.
CHECK-WARNING: BOLT-INFO: Functions with density >= 21.7 account for 97.00% total sample counts.
RUN: llvm-bolt %t.exe -data %t -o %t.null | FileCheck %s
RUN: llvm-bolt %t.exe -data %t.new -o %t.null | FileCheck %s
RUN: llvm-bolt %t.exe -p %p/Inputs/pre-aggregated.txt --pa -o %t.null | FileCheck %s
CHECK: BOLT-INFO: 4 out of 7 functions in the binary (57.1%) have non-empty execution profile
RUN: FileCheck %s -check-prefix=PERF2BOLT --input-file %t
RUN: FileCheck %s -check-prefix=NEWFORMAT --input-file %t.new
## Test --profile-format option with perf2bolt
RUN: perf2bolt %t.exe -o %t.fdata --pa -p %p/Inputs/pre-aggregated.txt \
RUN: --profile-format=fdata
RUN: FileCheck %s -check-prefix=PERF2BOLT --input-file %t.fdata
RUN: perf2bolt %t.exe -o %t.yaml --pa -p %p/Inputs/pre-aggregated.txt \
RUN: --profile-format=yaml --profile-use-dfs
RUN: FileCheck %s -check-prefix=NEWFORMAT --input-file %t.yaml
## Test --profile-format option with llvm-bolt --aggregate-only
RUN: llvm-bolt %t.exe -o %t.bolt.fdata --pa -p %p/Inputs/pre-aggregated.txt \
RUN: --aggregate-only --profile-format=fdata
RUN: FileCheck %s -check-prefix=PERF2BOLT --input-file %t.bolt.fdata
RUN: llvm-bolt %t.exe -o %t.bolt.yaml --pa -p %p/Inputs/pre-aggregated.txt \
RUN: --aggregate-only --profile-format=yaml --profile-use-dfs
RUN: FileCheck %s -check-prefix=NEWFORMAT --input-file %t.bolt.yaml
## Test pre-aggregated basic profile
RUN: perf2bolt %t.exe -o %t --pa -p %p/Inputs/pre-aggregated-basic.txt -o %t.ba \
RUN: 2>&1 | FileCheck %s --check-prefix=BASIC-ERROR
RUN: perf2bolt %t.exe -o %t --pa -p %p/Inputs/pre-aggregated-basic.txt -o %t.ba.nl \
RUN: -nl 2>&1 | FileCheck %s --check-prefix=BASIC-SUCCESS
RUN: FileCheck %s --input-file %t.ba.nl --check-prefix CHECK-BASIC-NL
BASIC-ERROR: BOLT-INFO: 0 out of 7 functions in the binary (0.0%) have non-empty execution profile
BASIC-SUCCESS: BOLT-INFO: 4 out of 7 functions in the binary (57.1%) have non-empty execution profile
CHECK-BASIC-NL: no_lbr cycles
PERF2BOLT: 1 frame_dummy/1 1e 1 frame_dummy/1 0 0 1
PERF2BOLT-NEXT: 1 main 451 1 SolveCubic 0 0 2
PERF2BOLT-NEXT: 1 main 490 0 [unknown] 0 0 1
PERF2BOLT-NEXT: 1 main 537 0 [unknown] 0 0 1
PERF2BOLT-NEXT: 0 [unknown] 0 1 main 53c 0 2
PERF2BOLT-NEXT: 1 usqrt a 1 usqrt 10 0 22
PERF2BOLT-NEXT: 1 usqrt 30 1 usqrt 32 0 22
PERF2BOLT-NEXT: 1 usqrt 30 1 usqrt 39 4 33
PERF2BOLT-NEXT: 1 usqrt 35 1 usqrt 39 0 22
PERF2BOLT-NEXT: 1 usqrt 3d 1 usqrt 10 0 58
PERF2BOLT-NEXT: 1 usqrt 3d 1 usqrt 3f 0 22
NEWFORMAT: - name: 'frame_dummy/1'
NEWFORMAT: fid: 3
NEWFORMAT: hash: 0x28C72085C0BD8D37
NEWFORMAT: exec: 1
NEWFORMAT: - name: usqrt
NEWFORMAT: fid: 7
NEWFORMAT: exec: 0
NEWFORMAT: nblocks: 5
NEWFORMAT: blocks:
NEWFORMAT: - bid: 0
NEWFORMAT: insns: 4
NEWFORMAT: succ: [ { bid: 1, cnt: 22 } ]
NEWFORMAT: - bid: 1
NEWFORMAT: insns: 9
NEWFORMAT: succ: [ { bid: 3, cnt: 33, mis: 4 }, { bid: 2, cnt: 22 } ]
NEWFORMAT: - bid: 2
NEWFORMAT: insns: 2
NEWFORMAT: succ: [ { bid: 3, cnt: 22 } ]
NEWFORMAT: - bid: 3
NEWFORMAT: insns: 2
NEWFORMAT: succ: [ { bid: 1, cnt: 58 }, { bid: 4, cnt: 22 } ]