## Motivation
Since we don't need the metadata sections at runtime, we can somehow
offload them from memory at runtime. Initially, I explored [debug info
correlation](https://discourse.llvm.org/t/instrprofiling-lightweight-instrumentation/59113),
which is used for PGO with value profiling disabled. However, it
currently only works with DWARF and it's be hard to add such artificial
debug info for every function in to CodeView which is used on Windows.
So, offloading profile metadata sections at runtime seems to be a
platform independent option.
## Design
The idea is to use new section names for profile name and data sections
and mark them as metadata sections. Under this mode, the new sections
are non-SHF_ALLOC in ELF. So, they are not loaded into memory at runtime
and can be stripped away as a post-linking step. After the process
exits, the generated raw profiles will contains only headers + counters.
llvm-profdata can be used correlate raw profiles with the unstripped
binary to generate indexed profile.
## Data
For chromium base_unittests with code coverage on linux, the binary size
overhead due to instrumentation reduced from 64M to 38.8M (39.4%) and
the raw profile files size reduce from 128M to 68M (46.9%)
```
$ bloaty out/cov/base_unittests.stripped -- out/no-cov/base_unittests.stripped
FILE SIZE VM SIZE
-------------- --------------
+121% +30.4Mi +121% +30.4Mi .text
[NEW] +14.6Mi [NEW] +14.6Mi __llvm_prf_data
[NEW] +10.6Mi [NEW] +10.6Mi __llvm_prf_names
[NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts
+95% +1.75Mi +95% +1.75Mi .eh_frame
+108% +400Ki +108% +400Ki .eh_frame_hdr
+9.5% +211Ki +9.5% +211Ki .rela.dyn
+9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro
+5.0% +87.3Ki +5.0% +87.3Ki .rodata
[ = ] 0 +13% +47.0Ki .bss
+40% +1.78Ki +40% +1.78Ki .got
+12% +1.49Ki +12% +1.49Ki .gcc_except_table
[ = ] 0 +65% +1.23Ki .relro_padding
+62% +1.20Ki [ = ] 0 [Unmapped]
+13% +448 +19% +448 .init_array
+8.8% +192 [ = ] 0 [ELF Section Headers]
+0.0% +136 +0.0% +80 [7 Others]
+0.1% +96 +0.1% +96 .dynsym
+1.2% +96 +1.2% +96 .rela.plt
+1.5% +80 +1.2% +64 .plt
[ = ] 0 -99.2% -3.68Ki [LOAD #5 [RW]]
+195% +64.0Mi +194% +64.0Mi TOTAL
$ bloaty out/cov-cor/base_unittests.stripped -- out/no-cov/base_unittests.stripped
FILE SIZE VM SIZE
-------------- --------------
+121% +30.4Mi +121% +30.4Mi .text
[NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts
+95% +1.75Mi +95% +1.75Mi .eh_frame
+108% +400Ki +108% +400Ki .eh_frame_hdr
+9.5% +211Ki +9.5% +211Ki .rela.dyn
+9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro
+5.0% +87.3Ki +5.0% +87.3Ki .rodata
[ = ] 0 +13% +47.0Ki .bss
+40% +1.78Ki +40% +1.78Ki .got
+12% +1.49Ki +12% +1.49Ki .gcc_except_table
+13% +448 +19% +448 .init_array
+0.1% +96 +0.1% +96 .dynsym
+1.2% +96 +1.2% +96 .rela.plt
+1.2% +64 +1.2% +64 .plt
+2.9% +64 [ = ] 0 [ELF Section Headers]
+0.0% +40 +0.0% +40 .data
+1.2% +32 +1.2% +32 .got.plt
+0.0% +24 +0.0% +8 [5 Others]
[ = ] 0 -22.9% -872 [LOAD #5 [RW]]
-74.5% -1.44Ki [ = ] 0 [Unmapped]
[ = ] 0 -76.5% -1.45Ki .relro_padding
+118% +38.8Mi +117% +38.8Mi TOTAL
```
A few things to note:
1. llvm-profdata doesn't support filter raw profiles by binary id yet,
so when a raw profile doesn't belongs to the binary being digested by
llvm-profdata, merging will fail. Once this is implemented,
llvm-profdata should be able to only merge raw profiles with the same
binary id as the binary and discard the rest (with mismatched/missing
binary id). The workflow I have in mind is to have scripts invoke
llvm-profdata to get all binary ids for all raw profiles, and
selectively choose the raw pnrofiles with matching binary id and the
binary to llvm-profdata for merging.
2. Note: In COFF, currently they are still loaded into memory but not
used. I didn't do it in this patch because I noticed that `.lcovmap` and
`.lcovfunc` are loaded into memory. A separate patch will address it.
3. This should works with PGO when value profiling is disabled as debug
info correlation currently doing, though I haven't tested this yet.
Part 1 of 3. This includes the LLVM back-end processing and profile
reading/writing components. compiler-rt changes are included.
Differential Revision: https://reviews.llvm.org/D138846
This seems to cause Clang to crash, see comments on the code review. Reverting
until the problem can be investigated.
> Part 1 of 3. This includes the LLVM back-end processing and profile
> reading/writing components. compiler-rt changes are included.
>
> Differential Revision: https://reviews.llvm.org/D138846
This reverts commit a50486fd73.
Part 1 of 3. This includes the LLVM back-end processing and profile
reading/writing components. compiler-rt changes are included.
Differential Revision: https://reviews.llvm.org/D138846
As described in [0], this extends IRPGO to support //Temporal Profiling//.
When `-pgo-temporal-instrumentation` is used we add the `llvm.instrprof.timestamp()` intrinsic to the entry of functions which in turn gets lowered to a call to the compiler-rt function `INSTR_PROF_PROFILE_SET_TIMESTAMP()`. A new field in the `llvm_prf_cnts` section stores each function's timestamp. Then in `llvm-profdata merge` we convert these function timestamps into a //trace// and add it to the indexed profile.
Since these traces could significantly increase the profile size, we've added `-max-temporal-profile-trace-length` and `-temporal-profile-trace-reservoir-size` to limit the length of a trace and the number of traces in a profile, respectively.
In a future diff we plan to use these traces to construct an optimized function order to reduce the number of page faults during startup.
Special thanks to Julian Mestre for helping with reservoir sampling.
[0] https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068
Reviewed By: snehasish
Differential Revision: https://reviews.llvm.org/D147287
This patch adds support for recording BuildIds usng the sanitizer
ListOfModules API. We add another entry to the SegmentEntry struct and
change the memprof raw version.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D145190
This patch adds support for recording BuildIds usng the sanitizer
ListOfModules API. We add another entry to the SegmentEntry struct and
change the memprof raw version.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D145190
Track min/max/avg access density (accesses per byte and accesses per
byte per lifetime second) metrics directly during profiling. This allows
more accurate use of these metrics in profile analysis and use, instead
of trying to compute them from already aggregated data in the profile.
This required regenerating some of the raw profile and executable inputs
for a few tests. While here, make the llvm-profdata memprof tests more
resilient to differences in things like memory mapping, timestamps and
cpu ids to make future test updates easier.
Differential Revision: https://reviews.llvm.org/D141558
This patch adds support for including binary ids in an indexed profile.
It adds a new field into the header that points to the offset of the
binary id section. The binary id section consists of a size of the
section, and a list of binary ids (if they are present) that consist
of two parts: length and data.
This patch guarantees that indexed profile is backwards compatible
after adding binary ids.
Differential Revision: https://reviews.llvm.org/D135929
This patch adds support for including binary ids in an indexed profile.
It adds a new field into the header that points to the offset of the
binary id section. The binary id section consists of a size of the
section, and a list of binary ids (if they are present) that consist
of two parts: length and data.
This patch guarantees that indexed profile is backwards compatible
after adding binary ids.
Differential Revision: https://reviews.llvm.org/D135929
that can lead to security vulnerabilities
Also, fix a few places that were causing -Wshadow and
-Wformat-nonliteral warnings to be emitted.
This reapplies the patch that was reverted in 0d66dc57e8 because it
broke a few bots.
I made changes so that cmake checks whether some of the flags are
supported by the compiler that is used before adding them to the list.
Also, I moved function add_security_warnings to CompilerRTUtils.cmake so
that it is defined before it's used.
Differential Revision: https://reviews.llvm.org/D131714
The existing code resulted in the max size and access counts being equal
to the min. Compute the max instead (max lifetime was already correct).
Differential Revision: https://reviews.llvm.org/D132515
that can lead to security vulnerabilities
Also, fix a few places that were causing -Wshadow and
-Wformat-nonliteral warnings to be emitted.
Differential Revision: https://reviews.llvm.org/D131714
This patch updates the existing default no-arg constructor for
MemInfoBlock to explicitly initialize all members. Also add missing
DataTypeId initialization to the other constructor. These issues were
exposed by msan on patch D121179. With this patch D121179 builds cleanly
on msan.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D122260
This patch adds support for optional memory profile information to be
included with and indexed profile. The indexed profile header adds a new
field which points to the offset of the memory profile section (if
present) in the indexed profile. For users who do not utilize this
feature the only overhead is a 64-bit offset in the header.
The memory profile section contains (1) profile metadata describing the
information recorded for each entry (2) an on-disk hashtable containing
the profile records indexed via llvm::md5(function_name). We chose to
introduce a separate hash table instead of the existing one since the
indexing for the instrumented fdo hash table is based on a CFG hash
which itself is perturbed by memprof instrumentation.
This commit also includes the changes reviewed separately in D120093.
Differential Revision: https://reviews.llvm.org/D120103
This reverts commit 85355a560a.
This patch adds support for optional memory profile information to be
included with and indexed profile. The indexed profile header adds a new
field which points to the offset of the memory profile section (if
present) in the indexed profile. For users who do not utilize this
feature the only overhead is a 64-bit offset in the header.
The memory profile section contains (1) profile metadata describing the
information recorded for each entry (2) an on-disk hashtable containing
the profile records indexed via llvm::md5(function_name). We chose to
introduce a separate hash table instead of the existing one since the
indexing for the instrumented fdo hash table is based on a CFG hash
which itself is perturbed by memprof instrumentation.
Differential Revision: https://reviews.llvm.org/D118653
This reverts commit e6999040f5.
Update test to fix signed int comparison warning, fix whitespace in
compiler-rt MIBEntryDef.inc file.
Differential Revision: https://reviews.llvm.org/D117256
This reverts commit 857ec0d01f.
Fixes -DLLVM_ENABLE_MODULES=On build by adding the new textual
header to the modulemap file.
Reviewed in https://reviews.llvm.org/D117722
This reverts commit 0f73fb18ca.
Use llvm/Profile/MIBEntryDef.inc instead of relative path.
Generated the raw profile data with `-mllvm
-enable-name-compression=false` so that builbots where the reader is
built without zlib do not fail.
Also updated the test build instructions.
This patch adds support for optional memory profile information to be
included with and indexed profile. The indexed profile header adds a new
field which points to the offset of the memory profile section (if
present) in the indexed profile. For users who do not utilize this
feature the only overhead is a 64-bit offset in the header.
The memory profile section contains (1) profile metadata describing the
information recorded for each entry (2) an on-disk hashtable containing
the profile records indexed via llvm::md5(function_name). We chose to
introduce a separate hash table instead of the existing one since the
indexing for the instrumented fdo hash table is based on a CFG hash
which itself is perturbed by memprof instrumentation.
Differential Revision: https://reviews.llvm.org/D118653
This patch refactors out the MemInfoBlock definition into a macro based
header which can be included to generate enums, structus and code for
each field recorded by the memprof profiling runtime.
Differential Revision: https://reviews.llvm.org/D117722
The definition of the MemInfoBlock is shared between the memprof
compiler-rt runtime and llvm/lib/ProfileData/. This change removes the
memprof_meminfoblock header and moves the struct to the shared include
file. To enable this sharing, the Print method is moved to the
memprof_allocator (the only place it is used) and the remaining uses are
updated to refer to the MemInfoBlock defined in the MemProfData.inc
file.
Also a couple of other minor changes which improve usability of the
types in MemProfData.inc.
* Update the PACKED macro to handle commas.
* Add constructors and equality operators.
* Don't initialize the buildid field.
Differential Revision: https://reviews.llvm.org/D116780
Use the llvm flag `-pgo-function-entry-coverage` to create single byte "counters" to track functions coverage. This mode has significantly less size overhead in both code and data because
* We mark a function as "covered" with a store instead of an increment which generally requires fewer assembly instructions
* We use a single byte per function rather than 8 bytes per block
The trade off of course is that this mode only tells you if a function has been covered. This is useful, for example, to detect dead code.
When combined with debug info correlation [0] we are able to create an instrumented Clang binary that is only 150M (the vanilla Clang binary is 143M). That is an overhead of 7M (4.9%) compared to the default instrumentation (without value profiling) which has an overhead of 31M (21.7%).
[0] https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4
Reviewed By: kyulee
Differential Revision: https://reviews.llvm.org/D116180
https://reviews.llvm.org/D116179 introduced some changes to
`InstrProfData.inc` which broke some downstream builds. This commit
reverts those changes since they only changes two field names.
Reviewed By: phosek
Differential Revision: https://reviews.llvm.org/D117631
Existing code tended to assume that counters had type `uint64_t` and
computed size from the number of counters. Fix this code to directly
compute the counters size in number of bytes where possible. When the
number of counters is needed, use `__llvm_profile_counter_entry_size()`
or `getCounterTypeSize()`. In a later diff these functions will depend
on the profile mode.
Change the meaning of `DataSize` and `CountersSize` to make them more clear.
* `DataSize` (`CountersSize`) - the size of the data (counter) section in bytes.
* `NumData` (`NumCounters`) - the number of data (counter) entries.
Reviewed By: kyulee
Differential Revision: https://reviews.llvm.org/D116179
Add the llvm flag `-debug-info-correlate` to attach debug info to instrumentation counters so we can correlate raw profile data to their functions. Raw profiles are dumped as `.proflite` files. The next diff enables `llvm-profdata` to consume `.proflite` and debug info files to produce a normal `.profdata` profile.
Part of the "lightweight instrumentation" work: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4
The original diff https://reviews.llvm.org/D114565 was reverted because of the `Instrumentation/InstrProfiling/debug-info-correlate.ll` test, which is fixed in this commit.
Reviewed By: kyulee
Differential Revision: https://reviews.llvm.org/D115693
This reverts commit 800bf8ed29.
The `Instrumentation/InstrProfiling/debug-info-correlate.ll` test was
failing because I forgot the `llc` commands are architecture specific.
I'll follow up with a fix.
Differential Revision: https://reviews.llvm.org/D115689
Add the llvm flag `-debug-info-correlate` to attach debug info to instrumentation counters so we can correlate raw profile data to their functions. Raw profiles are dumped as `.proflite` files. The next diff enables `llvm-profdata` to consume `.proflite` and debug info files to produce a normal `.profdata` profile.
Part of the "lightweight instrumentation" work: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4
Reviewed By: kyulee
Differential Revision: https://reviews.llvm.org/D114565
This commit adds initial support to llvm-profdata to read and print
summaries of raw memprof profiles.
Summary of changes:
* Refactor shared defs to MemProfData.inc
* Extend show_main to display memprof profile summaries.
* Add a simple raw memprof profile reader.
* Add a couple of tests to tools/llvm-profdata.
Differential Revision: https://reviews.llvm.org/D114286
This is to account for the change that made CountersPtr in __profd_
relative which landed in a1532ed275.
That change hasn't updated the raw profile version, and while the
profile layout stayed the same, profiles generated by tip-of-tree
LLVM are incompatible with 13.x tooling.
Differential Revision: https://reviews.llvm.org/D111123
This fixes support for merging profiles which broke as a consequence
of e50a38840d. The issue was missing
adjustment in merge logic to account for the binary IDs which are
now included in the raw profile just after header.
In addition, this change also:
* Includes the version in module signature that's used for merging
to avoid accidental attempts to merge incompatible profiles.
* Moves the binary IDs size field after version field in the header
as was suggested in the review.
Differential Revision: https://reviews.llvm.org/D107143
This fixes support for merging profiles which broke as a consequence
of e50a38840d. The issue was missing
adjustment in merge logic to account for the binary IDs which are
now included in the raw profile just after header.
In addition, this change also:
* Includes the version in module signature that's used for merging
to avoid accidental attempts to merge incompatible profiles.
* Moves the binary IDs size field after version field in the header
as was suggested in the review.
Differential Revision: https://reviews.llvm.org/D107143
This fixes support for merging profiles which broke as a consequence
of e50a38840d. The issue was missing
adjustment in merge logic to account for the binary IDs which are
now included in the raw profile just after header.
In addition, this change also:
* Includes the version in module signature that's used for merging
to avoid accidental attempts to merge incompatible profiles.
* Moves the binary IDs size field after version field in the header
as was suggested in the review.
Differential Revision: https://reviews.llvm.org/D107143
Change `CountersPtr` in `__profd_` to a label difference, which is a link-time
constant. On ELF, when linking a shared object, this requires that `__profc_` is
either private or linkonce/linkonce_odr hidden. On COFF, we need D104564 so that
`.quad a-b` (64-bit label difference) can lower to a 32-bit PC-relative relocation.
```
# ELF: R_X86_64_PC64 (PC-relative)
.quad .L__profc_foo-.L__profd_foo
# Mach-O: a pair of 8-byte X86_64_RELOC_UNSIGNED and X86_64_RELOC_SUBTRACTOR
.quad l___profc_foo-l___profd_foo
# COFF: we actually use IMAGE_REL_AMD64_REL32/IMAGE_REL_ARM64_REL32 so
# the high 32-bit value is zero even if .L__profc_foo < .L__profd_foo
# As compensation, we truncate CountersDelta in the header so that
# __llvm_profile_merge_from_buffer and llvm-profdata reader keep working.
.quad .L__profc_foo-.L__profd_foo
```
(Note: link.exe sorts `.lprfc` before `.lprfd` even if the object writer
has `.lprfd` before `.lprfc`, so we cannot work around by reordering
`.lprfc` and `.lprfd`.)
With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`)
`ld -pie` linked clang is 1.74% smaller due to fewer R_X86_64_RELATIVE relocations.
```
% readelf -r pie | awk '$3~/R.*/{s[$3]++} END {for (k in s) print k, s[k]}'
R_X86_64_JUMP_SLO 331
R_X86_64_TPOFF64 2
R_X86_64_RELATIVE 476059 # was: 607712
R_X86_64_64 2616
R_X86_64_GLOB_DAT 31
```
The absolute function address (used by llvm-profdata to collect indirect call
targets) can be converted to relative as well, but is not done in this patch.
Differential Revision: https://reviews.llvm.org/D104556