In comparison to non-instrumented binaries, PGO instrumentation binaries can be significantly slower. For highly threaded programs, this slowdown can reach 10x due to data races or false sharing within counters. This patch incorporates sampling into the PGO instrumentation process to enhance the speed of instrumentation binaries. The fundamental concept is similar to the one proposed in https://reviews.llvm.org/D63949. Three sampling modes are introduced: 1. Simple Sampling: When '-sampled-instr-bust-duration' is set to 1. 2. Fast Burst Sampling: When not using simple sampling, and '-sampled-instr-period' is set to 65535. This is the default mode of sampling. 3. Full Burst Sampling: When neither simple nor fast burst sampling is used. Utilizing this sampled instrumentation significantly improves the binary's execution speed. Measurements show up to 5x speedup with default settings. Fast burst sampling now results in only around 20% to 30% slowdown (compared to 8 to 10x slowdown without sampling). Out tests show that profile quality remains good with sampling, with edge counts typically showing more than 90% overlap. For applications whose behavior changes due to binary speed, sampling instrumentation can enhance performance. Observations have shown some apps experiencing up to a ~2% improvement in PGO. A potential drawback of this patch is the increased binary size and compilation time. The Sampling method in this patch does not improve single threaded program instrumentation binary speed.
82 KiB
82 KiB