## Motivation
Since we don't need the metadata sections at runtime, we can somehow
offload them from memory at runtime. Initially, I explored [debug info
correlation](https://discourse.llvm.org/t/instrprofiling-lightweight-instrumentation/59113),
which is used for PGO with value profiling disabled. However, it
currently only works with DWARF and it's be hard to add such artificial
debug info for every function in to CodeView which is used on Windows.
So, offloading profile metadata sections at runtime seems to be a
platform independent option.
## Design
The idea is to use new section names for profile name and data sections
and mark them as metadata sections. Under this mode, the new sections
are non-SHF_ALLOC in ELF. So, they are not loaded into memory at runtime
and can be stripped away as a post-linking step. After the process
exits, the generated raw profiles will contains only headers + counters.
llvm-profdata can be used correlate raw profiles with the unstripped
binary to generate indexed profile.
## Data
For chromium base_unittests with code coverage on linux, the binary size
overhead due to instrumentation reduced from 64M to 38.8M (39.4%) and
the raw profile files size reduce from 128M to 68M (46.9%)
```
$ bloaty out/cov/base_unittests.stripped -- out/no-cov/base_unittests.stripped
FILE SIZE VM SIZE
-------------- --------------
+121% +30.4Mi +121% +30.4Mi .text
[NEW] +14.6Mi [NEW] +14.6Mi __llvm_prf_data
[NEW] +10.6Mi [NEW] +10.6Mi __llvm_prf_names
[NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts
+95% +1.75Mi +95% +1.75Mi .eh_frame
+108% +400Ki +108% +400Ki .eh_frame_hdr
+9.5% +211Ki +9.5% +211Ki .rela.dyn
+9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro
+5.0% +87.3Ki +5.0% +87.3Ki .rodata
[ = ] 0 +13% +47.0Ki .bss
+40% +1.78Ki +40% +1.78Ki .got
+12% +1.49Ki +12% +1.49Ki .gcc_except_table
[ = ] 0 +65% +1.23Ki .relro_padding
+62% +1.20Ki [ = ] 0 [Unmapped]
+13% +448 +19% +448 .init_array
+8.8% +192 [ = ] 0 [ELF Section Headers]
+0.0% +136 +0.0% +80 [7 Others]
+0.1% +96 +0.1% +96 .dynsym
+1.2% +96 +1.2% +96 .rela.plt
+1.5% +80 +1.2% +64 .plt
[ = ] 0 -99.2% -3.68Ki [LOAD #5 [RW]]
+195% +64.0Mi +194% +64.0Mi TOTAL
$ bloaty out/cov-cor/base_unittests.stripped -- out/no-cov/base_unittests.stripped
FILE SIZE VM SIZE
-------------- --------------
+121% +30.4Mi +121% +30.4Mi .text
[NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts
+95% +1.75Mi +95% +1.75Mi .eh_frame
+108% +400Ki +108% +400Ki .eh_frame_hdr
+9.5% +211Ki +9.5% +211Ki .rela.dyn
+9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro
+5.0% +87.3Ki +5.0% +87.3Ki .rodata
[ = ] 0 +13% +47.0Ki .bss
+40% +1.78Ki +40% +1.78Ki .got
+12% +1.49Ki +12% +1.49Ki .gcc_except_table
+13% +448 +19% +448 .init_array
+0.1% +96 +0.1% +96 .dynsym
+1.2% +96 +1.2% +96 .rela.plt
+1.2% +64 +1.2% +64 .plt
+2.9% +64 [ = ] 0 [ELF Section Headers]
+0.0% +40 +0.0% +40 .data
+1.2% +32 +1.2% +32 .got.plt
+0.0% +24 +0.0% +8 [5 Others]
[ = ] 0 -22.9% -872 [LOAD #5 [RW]]
-74.5% -1.44Ki [ = ] 0 [Unmapped]
[ = ] 0 -76.5% -1.45Ki .relro_padding
+118% +38.8Mi +117% +38.8Mi TOTAL
```
A few things to note:
1. llvm-profdata doesn't support filter raw profiles by binary id yet,
so when a raw profile doesn't belongs to the binary being digested by
llvm-profdata, merging will fail. Once this is implemented,
llvm-profdata should be able to only merge raw profiles with the same
binary id as the binary and discard the rest (with mismatched/missing
binary id). The workflow I have in mind is to have scripts invoke
llvm-profdata to get all binary ids for all raw profiles, and
selectively choose the raw pnrofiles with matching binary id and the
binary to llvm-profdata for merging.
2. Note: In COFF, currently they are still loaded into memory but not
used. I didn't do it in this patch because I noticed that `.lcovmap` and
`.lcovfunc` are loaded into memory. A separate patch will address it.
3. This should works with PGO when value profiling is disabled as debug
info correlation currently doing, though I haven't tested this yet.
Add __memprof_profile_reset() interface which can be used to facilitate
dumping multiple rounds of profiles from a single binary run. This
closes the current file descriptor and resets the internal file
descriptor to invalid (-1), which ensures the underlying writer reopens
the recorded profile filename. This can be used once the client is done
moving or copying a dumped profile, to prepare for reinvoking profile
dumping.
As an intrinsic, `_ReturnAddress` does not need it; additionally,
if someone else declares `_ReturnAddress` without `__cdecl` (for
example, `<intrin.h>`)
Additionally, actually add a test for this change. I've tested it
locally with both LLVM and MSVC.
This adds a new helper that can be called from application code to
ensure that no mutexes are held on specific code paths. This is useful
for multiple scenarios, including ensuring no locks are held:
- at thread exit
- in peformance-critical code
- when a coroutine is suspended (can cause deadlocks)
See this discourse thread for more discussion:
https://discourse.llvm.org/t/add-threadsanitizer-check-to-prevent-coroutine-suspending-while-holding-a-lock-potential-deadlock/74051
This resubmits and fixes#69372 (was reverted because of build
breakage).
This also includes the followup change #71471 (to fix a land race).
The new lit test fails, see comment on the PR. This also reverts
the follow-up commit, see below.
> This adds a new helper that can be called from application code to
> ensure that no mutexes are held on specific code paths. This is useful
> for multiple scenarios, including ensuring no locks are held:
>
> - at thread exit
> - in peformance-critical code
> - when a coroutine is suspended (can cause deadlocks)
>
> See this discourse thread for more discussion:
>
> https://discourse.llvm.org/t/add-threadsanitizer-check-to-prevent-coroutine-suspending-while-holding-a-lock-potential-deadlock/74051
This reverts commit bd841111f3.
This reverts commit 16a395b74d.
in https://github.com/llvm/llvm-project/pull/69625 @strega-nil added
cdecl to a huge number of sanitizer interface declarations. It looks
like she was racing against @kennyyu adding a tsan interface function. I
noticed this when merging in the latest changes from llvm/main and
corrected it.
Co-authored-by: Charlie Barto <Charles.Barto@microsoft.com>
Public headers intended for user code should not define `__has_feature`,
because this can break preprocessor checks done later in user code, e.g.
if they test `#ifdef __has_feature` to check for real support in the
compiler.
Replace the only use in the public header with a check for it being
supported before trying to use it. Define the fallback definition in the
internal headers, so that other internal sanitizer headers can continue
to use it as preferred.
This resolves a bug reported to GCC as https://gcc.gnu.org/PR109882
This is necessary for many projects which pass `/Gz` to their compiles,
which makes their default calling convention `__stdcall`.
(personal note, I _really_ wish there was a pragma for this)
Part 1 of 3. This includes the LLVM back-end processing and profile
reading/writing components. compiler-rt changes are included.
Differential Revision: https://reviews.llvm.org/D138846
This seems to cause Clang to crash, see comments on the code review. Reverting
until the problem can be investigated.
> Part 1 of 3. This includes the LLVM back-end processing and profile
> reading/writing components. compiler-rt changes are included.
>
> Differential Revision: https://reviews.llvm.org/D138846
This reverts commit a50486fd73.
Part 1 of 3. This includes the LLVM back-end processing and profile
reading/writing components. compiler-rt changes are included.
Differential Revision: https://reviews.llvm.org/D138846
This makes the implicit conversion that is happening explicit.
Otherwise, each user is forced to suppress this
implicit-integer-sign-change runtime error in their their UBSAN
suppressions file.
For example, the runtime error might look like:
runtime error: implicit conversion from type 'long' of value -9223372036854775808 (64-bit, signed) to type 'uint64_t' (aka 'unsigned long') changed the value to 9223372036854775808 (64-bit, unsigned)
#0 0x55fe29dea91d in long FuzzedDataProvider::ConsumeIntegralInRange<long>(long, long) src/./test/fuzz/FuzzedDataProvider.h:233:25
[...]
SUMMARY: UndefinedBehaviorSanitizer: implicit-integer-sign-change test/fuzz/FuzzedDataProvider.h:233:25 in
Differential Revision: https://reviews.llvm.org/D155206
The Clang built-in function is void __xray_typedevent(size_t, const void *, size_t),
but the LLVM intrinsics has smaller integer types. Since we only allow
64-bit ELF/Mach-O targets, we can change llvm.xray.typedevent to
i64/ptr/i64.
This allows encoding more information and avoids i16 legalization for
many non-X86 targets.
fdrLoggingHandleTypedEvent only supports uint16_t event type.
The primary motivation for this change is to allow FreeHooks to obtain
the allocated size of the pointer being freed in a fast, efficient manner.
Differential Revision: https://reviews.llvm.org/D151360
Revert "[xray] Ignore -Wc++20-extensions in xray_records.h [NFC]"
Not needed. The fix is 3826a74fc7.
This reverts commit 231c1d4134.
This reverts commit 7f191e6d2c.
This revision updates the description of
`__sanitizer_annotate_contiguous_container` in includes. Possibilites of
the function were changed in D132522 and it supports:
- unaligned beginning,
- shared first/last granule with other objects.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D149341
This broke the lit tests on Mac, see comment on the code review.
> This change the types to match the ones used in:
> Darwin/debug_external.cpp
> debugging.cpp
>
> Reviewed By: vitalybuka
>
> Differential Revision: https://reviews.llvm.org/D148214
This reverts commit ea7d6e658e.
This change the types to match the ones used in:
Darwin/debug_external.cpp
debugging.cpp
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D148214
It broke lit tests on Mac, see comments on the code review.
> Reviewed By: vitalybuka, dvyukov
>
> Differential Revision: https://reviews.llvm.org/D147337
This reverts commit ebb0f1d063 and
follow-up commit 3c83aeee6b.
As described in [0], this extends IRPGO to support //Temporal Profiling//.
When `-pgo-temporal-instrumentation` is used we add the `llvm.instrprof.timestamp()` intrinsic to the entry of functions which in turn gets lowered to a call to the compiler-rt function `INSTR_PROF_PROFILE_SET_TIMESTAMP()`. A new field in the `llvm_prf_cnts` section stores each function's timestamp. Then in `llvm-profdata merge` we convert these function timestamps into a //trace// and add it to the indexed profile.
Since these traces could significantly increase the profile size, we've added `-max-temporal-profile-trace-length` and `-temporal-profile-trace-reservoir-size` to limit the length of a trace and the number of traces in a profile, respectively.
In a future diff we plan to use these traces to construct an optimized function order to reduce the number of page faults during startup.
Special thanks to Julian Mestre for helping with reservoir sampling.
[0] https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068
Reviewed By: snehasish
Differential Revision: https://reviews.llvm.org/D147287
D147005 introduced __sanitizer_get_allocated_begin, with a return
value of void*. This involved a few naughty casts that dropped the
const. This patch adds back the const qualifier.
Differential Revision: https://reviews.llvm.org/D147489
This patch adds support for recording BuildIds usng the sanitizer
ListOfModules API. We add another entry to the SegmentEntry struct and
change the memprof raw version.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D145190
This patch adds support for recording BuildIds usng the sanitizer
ListOfModules API. We add another entry to the SegmentEntry struct and
change the memprof raw version.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D145190
The '__' prefix should only be used for the parts of the ORC runtime that
implement compiler / loader runtime details (e.g. ORC-RT's __tlv_get_addr
implementations).
This patch only fixes the public API. Future changes will fix internal names.
Track min/max/avg access density (accesses per byte and accesses per
byte per lifetime second) metrics directly during profiling. This allows
more accurate use of these metrics in profile analysis and use, instead
of trying to compute them from already aggregated data in the profile.
This required regenerating some of the raw profile and executable inputs
for a few tests. While here, make the llvm-profdata memprof tests more
resilient to differences in things like memory mapping, timestamps and
cpu ids to make future test updates easier.
Differential Revision: https://reviews.llvm.org/D141558
This patch adds support for including binary ids in an indexed profile.
It adds a new field into the header that points to the offset of the
binary id section. The binary id section consists of a size of the
section, and a list of binary ids (if they are present) that consist
of two parts: length and data.
This patch guarantees that indexed profile is backwards compatible
after adding binary ids.
Differential Revision: https://reviews.llvm.org/D135929
This patch adds support for including binary ids in an indexed profile.
It adds a new field into the header that points to the offset of the
binary id section. The binary id section consists of a size of the
section, and a list of binary ids (if they are present) that consist
of two parts: length and data.
This patch guarantees that indexed profile is backwards compatible
after adding binary ids.
Differential Revision: https://reviews.llvm.org/D135929
This revision is a part of a series of patches extending
AddressSanitizer C++ container overflow detection capabilities by adding
annotations, similar to those existing in std::vector, to std::string
and std::deque collections. These changes allow ASan to detect cases
when the instrumented program accesses memory which is internally
allocated by the collection but is still not in-use (accesses before or
after the stored elements for std::deque, or between the size and
capacity bounds for std::string).
The motivation for the research and those changes was a bug, found by
Trail of Bits, in a real code where an out-of-bounds read could happen
as two strings were compared via a std::equals function that took
iter1_begin, iter1_end, iter2_begin iterators (with a custom comparison
function). When object iter1 was longer than iter2, read out-of-bounds
on iter2 could happen. Container sanitization would detect it.
This revision adds a new compiler-rt ASan sanitization API function
sanitizer_annotate_double_ended_contiguous_container necessary to
sanitize/annotate double ended contiguous containers. Note that that
function annotates a single contiguous memory buffer (for example the
std::deque's internal chunk). Such containers have the beginning of
allocated memory block, beginning of the container in-use data, end of
the container's in-use data and the end of the allocated memory block.
This also adds a new API function to verify if a double ended contiguous
container is correctly annotated
(__sanitizer_verify_double_ended_contiguous_container).
Since we do not modify the ASan's shadow memory encoding values, the
capability of sanitizing/annotating a prefix of the internal contiguous
memory buffer is limited – up to SHADOW_GRANULARITY-1 bytes may not be
poisoned before the container's in-use data. This can cause false
negatives (situations when ASan will not detect memory corruption in
those areas).
On the other hand, API function interfaces are designed to work even if
this caveat would not exist. Therefore implementations using those
functions will poison every byte correctly, if only ASan (and
compiler-rt) is extended to support it. In other words, if ASan was
modified to support annotating/poisoning of objects lying on addresses
unaligned to SHADOW_GRANULARITY (so e.g. prefixes of those blocks),
which would require changing its shadow memory encoding, this would not
require any changes in the libcxx std::string/deque code which is added
in further commits of this patch series.
If you have any questions, please email:
advenam.tacet@trailofbits.comdisconnect3d@trailofbits.com
Differential Revision: https://reviews.llvm.org/D132090
When COMPILER_RT_BUILD_MEMPROF is disabled, the memprof headers should not be installed.
Reviewed By: mgorny, tejohnson
Differential Revision: https://reviews.llvm.org/D136550
This change allows users manually calling memprof public C API (e.g. __memprof_profile_dump).
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D136067
The ORC runtime isn't used by clang -- the prefix was just cargo-culted with
the rest of the XRay config when the ORC runtime was introduced. We now want to
make parts of it available for clients to link directly, so this seems like a
good time to fix the name.