Commit Graph

425 Commits

Author SHA1 Message Date
sinan
c3bbc3a57d [BOLT] Fix logs with no hex convension (#112650)
Add `utohexstr` to ensure that offsets/addresses are correctly formatted
as hexadecimal values.
2024-10-18 09:46:41 +08:00
ShatianWang
4cab01f072 [BOLT] Profile quality stats -- CFG discontinuity (#109683)
In a perfect profile, each positive-execution-count block in the
function’s CFG should be reachable from a positive-execution-count
function entry block through a positive-execution-count path. This new
pass checks how well the BOLT input profile satisfies this “CFG
continuity” property.

More specifically, for each of the hottest 1000 functions, the pass
calculates the function’s fraction of basic block execution counts that
is “unreachable”. It then reports the 95th percentile of the
distribution of the 1000 unreachable fractions in a single BOLT-INFO
line. The smaller the reported value is, the better the BOLT profile
satisfies the CFG continuity property.

The default value of 1000 above can be changed via the hidden BOLT
option `-num-functions-for-continuity-check=[N]`. If more detailed stats
are needed, `-v=1` can be added to the BOLT invocation: the hottest N
functions will be grouped into 5 equally-sized buckets, from the hottest
to the coldest; for each bucket, various summary statistics of the
distribution of the fractions and the raw unreachable execution counts
will be reported.
2024-10-08 19:07:43 -04:00
Youngsuk Kim
0a5edb4de4 [bolt] Don't call llvm::raw_string_ostream::flush() (NFC)
Don't call raw_string_ostream::flush(), which is essentially a no-op.
As specified in the docs, raw_string_ostream is always unbuffered.
( 65b13610a5 for further reference )
2024-09-23 17:07:11 -05:00
Kristof Beyls
6d216fb7b8 [perf2bolt] Improve heuristic to map in-process addresses to specific… (#109397)
… segments in Elf binary.

The heuristic is improved by also taking into account that only
executable segments should contain instructions.

Fixes #109384.
2024-09-23 15:14:51 +02:00
sinan
31ac3d092b [BOLT] Add .iplt support to x86 (#106513)
Add X86 support for parsing .iplt section and symbols.
2024-09-23 18:22:43 +08:00
Amir Ayupov
86ec59e2f7 [BOLT] Only parse probes for profiled functions in profile-write-pseudo-probes mode (#106365)
Implement selective probe parsing for profiled functions only when
emitting probe information to YAML profile as suggested in

https://github.com/llvm/llvm-project/pull/102904#pullrequestreview-2248714190

For a large binary, this reduces probe parsing 
- processing time from 10.5925s to 5.6295s,
- peak RSS from 10.54 to 7.98 GiB.
2024-09-11 16:33:34 -07:00
Amir Ayupov
c820bd3e33 [BOLT][NFC] Rename profile-use-pseudo-probes
The flag currently controls writing of probe information in YAML
profile. #99891 adds a separate flag to use probe information for stale
profile matching. Thus `profile-use-pseudo-probes` becomes a misnomer
and `profile-write-pseudo-probes` better captures the intent.

Reviewers: maksfb, WenleiHe, ayermolo, rafaelauler, dcci

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/106364
2024-09-11 16:27:33 -07:00
Amir Ayupov
a66ce58ac6 [BOLT] Drop suffixes in parsePseudoProbe GUID assignment (#106243)
Pseudo probe function records contain GUIDs assigned by the compiler
using an IR function name. Thus suffixes added later (e.g. `.llvm.`
for internal symbols, `.destroy`/`.resume` for coroutine fragments, 
and `.cold`/`.warm` for split fragments) cause GUID mismatch.

Address that by dropping those suffixes using `getCommonName` which is
a parametrized form of `getLTOCommonName`.
2024-09-11 14:42:51 -07:00
Amir Ayupov
a79cf0228e [MC][NFC] Use vector for GUIDProbeFunctionMap
Replace unordered_map with a vector. Pre-parse the section to statically
allocate storage. Use BumpPtrAllocator for FuncName strings, keep
StringRef in FuncDesc.

Reduces peak RSS of pseudo probe parsing from 9.08 GiB to 8.89 GiB as
part of perf2bolt with a large binary.

Test Plan:
```
bin/llvm-lit -sv test/tools/llvm-profgen
```

Reviewers: wlei-llvm, rafaelauler, dcci, maksfb, ayermolo

Reviewed By: wlei-llvm

Pull Request: https://github.com/llvm/llvm-project/pull/102905
2024-08-26 09:15:53 -07:00
Amir Ayupov
ee09f7d1fc [MC][NFC] Reduce Address2ProbesMap size
Replace the map from addresses to list of probes with a flat vector
containing probe references sorted by their addresses.

Reduces pseudo probe parsing time from 9.56s to 8.59s and peak RSS from
9.66 GiB to 9.08 GiB as part of perf2bolt processing a large binary.

Test Plan:
```
bin/llvm-lit -sv test/tools/llvm-profgen
```

Reviewers: maksfb, rafaelauler, dcci, ayermolo, wlei-llvm

Reviewed By: wlei-llvm

Pull Request: https://github.com/llvm/llvm-project/pull/102904
2024-08-26 09:14:35 -07:00
Amir Ayupov
04ebd1907c [MC][NFC] Statically allocate storage for decoded pseudo probes and function records
Use #102774 to allocate storage for decoded probes (`PseudoProbeVec`)
and function records (`InlineTreeVec`).

Leverage that to also shrink sizes of `MCDecodedPseudoProbe`:
- Drop Guid since it's accessible via `InlineTree`.

`MCDecodedPseudoProbeInlineTree`:
- Keep track of probes and inlinees using `ArrayRef`s now that probes
  and function records belonging to the same function are allocated
  contiguously.

This reduces peak RSS from 13.7 GiB to 9.7 GiB and pseudo probe parsing
time (as part of perf2bolt) from 15.3s to 9.6s for a large binary with
400MiB .pseudo_probe section containing 43M probes and 25M function
records.

Depends on:
#102774
#102787
#102788

Reviewers: maksfb, rafaelauler, dcci, ayermolo, wlei-llvm

Reviewed By: wlei-llvm

Pull Request: https://github.com/llvm/llvm-project/pull/102789
2024-08-26 09:09:13 -07:00
Amir Ayupov
121ed07975 [MC][NFC] Count pseudo probes and function records
Pre-parse pseudo probes section counting the number of probes and
function records. These numbers are used in follow-up diff to
pre-allocate vectors for decoded probes and inline tree nodes.

Additional benefit is avoiding error handling during parsing.

This pre-parsing is fast: for a 404MiB .pseudo_probe section with
43373881 probes and 25228770 function records, it only takes 0.68±0.01s.
The total time of buildAddress2ProbeMap is 21s.

Reviewers: dcci, maksfb, rafaelauler, wlei-llvm, ayermolo

Reviewed By: wlei-llvm

Pull Request: https://github.com/llvm/llvm-project/pull/102774
2024-08-26 09:05:34 -07:00
Sayhaan Siddiqui
6aad62cf5b [BOLT][DWARF] Add parallelization for processing of DWO debug information (#100282)
Enables parallelization for the processing of DWO CUs.
2024-08-08 16:41:51 -07:00
Davide Italiano
e49549ff19 Revert "[BOLT] Abort on out-of-section symbols in GOT (#100801)"
This reverts commit a4900f0d93.
2024-08-07 20:52:19 -07:00
Vladislav Khmelevsky
445023f173 Revert "[BOLT] Move ADRRelaxationPass (#101371)" (#102333)
This reverts commit 750b12f06b.
The pass should run after splitting phase, but before nop removal
2024-08-07 21:03:51 +04:00
Sayhaan Siddiqui
62e894e0d7 [BOLT][DWARF][NFC] Move Arch assignment out of createBinaryContext (#102054)
Moves the assignment of Arch out of createBinaryContext to prevent data
races when parallelized.
2024-08-07 16:55:39 +00:00
Vladislav Khmelevsky
a4900f0d93 [BOLT] Abort on out-of-section symbols in GOT (#100801)
This patch aborts BOLT execution if it finds out-of-section (section
end) symbol in GOT table. In order to handle such situations properly in
future, we would need to have an arch-dependent way to analyze
relocations or its sequences, e.g., for ARM it would probably be ADRP +
LDR analysis in order to get GOT entry address. Currently, it is also
challenging because GOT-related relocation symbols are replaced to
__BOLT_got_zero. Anyway, it seems to be quite a rare case, which seems
to be only? related to static binaries. For the most part, it seems that
it should be handled on the linker stage, since static binary should not
have GOT table at all. LLD linker with relaxations enabled would replace
instruction addresses from GOT directly to target symbols, which
eliminates the problem.

Anyway, in order to achieve detection of such cases, this patch fixes a
few things in BOLT:
1. For the end symbols, we're now using the section provided by ELF
binary. Previously it would be tied with a wrong section found by symbol
address.
2. The end symbols would have limited registration we would only
add them in name->data GlobalSymbols map, since using address->data
BinaryDataMap map would likely be impossible due to address duality of
such symbols.
3. The outdated BD->getSection (currently returning refence, not
pointer) check in postProcessSymbolTable is replaced by getSize check in
order to allow zero-sized top-level symbols if they are located in
zero-sized sections. For the most part, such things could only be found
in tests, but I don't see a reason not to handle such cases.
4. Updated section-end-sym test and removed x86_64 requirement since
there is no reason for this (tested on aarch64 linux)

The test was provided by peterwaller-arm (thank you) in #100096 and
slightly modified by me.
2024-08-07 16:26:12 +04:00
Vladislav Khmelevsky
097ddd3565 [BOLT] Fix relocations handling (#100890)
After porting BOLT to RISCV some of the relocations were broken on both
AArch64 and X86.
On AArch64 the example of broken relocations would be GOT, during
handling them, we should replace the symbol to __BOLT_got_zero in order
to address GOT entry, not the symbol that addresses this entry. This is
done further in code, so it is too early to add rel here.
On X86 it is a mistake to add relocations without addend. This is the
exact problem that is raised on #97937. Due to different code generation
I had to use gcc-generated yaml test, since with clang I wasn't able to
reproduce problem.
Added tests for both architectures and made the problematic condition
riscV-specific.
2024-08-07 16:25:46 +04:00
Vladislav Khmelevsky
750b12f06b [BOLT] Move ADRRelaxationPass (#101371)
For non-simple functions we need nop instruction to be presented to
transform ADR to ADRP+ADD sequence, so run this pass before remove nops
pass.
2024-08-07 16:23:38 +04:00
sinan
6c8933e1a0 [BOLT] Skip PLT search for zero-value weak reference symbols (#69136)
Take a common weak reference pattern for example
```
    __attribute__((weak)) void undef_weak_fun();
    
      if (&undef_weak_fun)
        undef_weak_fun();
```
    
In this case, an undefined weak symbol `undef_weak_fun` has an address
of zero, and Bolt incorrectly changes the relocation for the
corresponding symbol to symbol@PLT, leading to incorrect runtime
behavior.
2024-08-07 18:02:42 +08:00
sinan
734c0488b6 [BOLT] Support map other function entry address (#101466)
Allow BOLT to map the old address to a new binary address if the old
address is the entry of the function.
2024-08-07 15:57:25 +08:00
Amir Ayupov
3f51bec466 [BOLT][NFC] Print timers in perf2bolt invocation
When BOLT is run in AggregateOnly mode (perf2bolt), it exits with code
zero so destructors are not run thus TimerGroup never prints the timers.

Add explicit printing just before the exit to honor options requesting
timers (`--time-rewrite`, `--time-aggr`).

Test Plan: updated bolt/test/timers.c

Reviewers: ayermolo, maksfb, rafaelauler, dcci

Reviewed By: dcci

Pull Request: https://github.com/llvm/llvm-project/pull/101270
2024-07-31 22:14:52 -07:00
Amir Ayupov
fb97b4f962 [BOLT][NFC] Add timers for MetadataManager invocations
Test Plan: added bolt/test/timers.c

Reviewers: ayermolo, maksfb, rafaelauler, dcci

Reviewed By: dcci

Pull Request: https://github.com/llvm/llvm-project/pull/101267
2024-07-31 22:12:34 -07:00
Sayhaan Siddiqui
910012e7c5 [BOLT][DWARF][NFC] Split DIEBuilder::finish (#101244)
Split DIEBuilder::finish so that code updating .debug_names is in a
separate function.
2024-07-31 13:41:38 -07:00
Sayhaan Siddiqui
79dcd93b70 [BOLT][DWARF] Remove option to write to DWP (#100771)
Remove the --write-dwp option as well as related code and tests.
2024-07-30 16:58:01 -07:00
Sayhaan Siddiqui
9a3e66e314 [BOLT][DWARF][NFC] Fix DebugStrOffsetsWriter (#100672)
Fix DebugStrOffsetsWriter so updateAddressMap can't be called after it
is finalized.
2024-07-26 18:58:25 -07:00
Sayhaan Siddiqui
b33ef5bd68 [BOLT][DWARF][NFC] Add mc opt to DWARFRewriter.cpp (#100800)
Running into an error with removing DWP where the assertion
`RelaxAllView &&
"RegisterMCTargetOptionsFlags not created."'` failed. This is a result
of DWP bringing the mc::RegisterMCTargetOptionsFlags option in, and the
option being removed with DWP. The need for this option didn't
originally exist because we didn't use MC in DWARFRewriter, but we
switched to using DWARFStreamer which needed the option.

https://reviews.llvm.org/D75579 
https://reviews.llvm.org/D106417
2024-07-26 14:09:46 -07:00
Amir Ayupov
4d19676de4 [BOLT] Add profile-use-pseudo-probes option
Move pseudo probe profile generation under --profile-use-pseudo-probes
option. Note that updating pseudo probes is independent from this flag.

Test Plan: updated pseudoprobe-decoding-inline.test

Reviewers: maksfb, rafaelauler, ayermolo, dcci, WenleiHe

Reviewed By: WenleiHe

Pull Request: https://github.com/llvm/llvm-project/pull/100299
2024-07-24 07:31:01 -07:00
Sayhaan Siddiqui
ea4a348098 [BOLT][DWARF][NFC] Move initialization of DWOName outside of lambda (#99728)
Followup to the splitting of processUnitDIE, moves code that accesses
common resource to be outside of the function that will be parallelized.

Followup to #99957
2024-07-23 17:30:54 -07:00
Sayhaan Siddiqui
7cd7a1eab4 [BOLT][DWARF][NFC] Split processUnitDIE into two lambdas (#99957)
Split processUnitDIE into two lambdas to separate the processing of DWO
CUs and CUs in the main binary.
2024-07-23 12:59:40 -07:00
Sayhaan Siddiqui
bdee9b05de Revert "[BOLT][DWARF][NFC] Split processUnitDIE into two lambdas" (#99904)
Reverts llvm/llvm-project#99225
2024-07-22 12:31:51 -07:00
Sayhaan Siddiqui
6747f12931 [BOLT][DWARF][NFC] Split processUnitDIE into two lambdas (#99225)
Split processUnitDIE into two lambdas to separate the processing of DWO
CUs and CUs in the main binary.
2024-07-19 17:52:49 -07:00
Daniel Hill
b686600a57 [BOLT] Skip instruction shortening (#93032)
Add the ability to disable the instruction shortening pass through
--shorten-instructions=false
2024-07-19 16:52:01 -07:00
Sayhaan Siddiqui
d54ec64f67 [BOLT][DWARF] Remove deprecated opt (#99575)
Remove deprecated DeterministicDebugInfo option and its uses.
2024-07-19 14:03:50 -07:00
Amir Ayupov
9b007a199d [BOLT] Expose pseudo probe function checksum and GUID (#99389)
Add a BinaryFunction field for pseudo probe function GUID.
Populate it during pseudo probe section parsing, and emit it in YAML
profile (both regular and BAT), along with function checksum.

To be used for stale function matching.

Test Plan: update pseudoprobe-decoding-inline.test
2024-07-18 20:58:16 -07:00
Sayhaan Siddiqui
c0c157a518 [BOLT][DWARF][NFC] Remove DWO ranges base (#99284)
Removes getters and setters for DWO ranges base due to it not being
used.
2024-07-18 09:24:46 -07:00
Vladislav Khmelevsky
51122fb446 [BOLT][NFC] Fix build (#99361)
On clang 14 the build is failing with:
reference to local binding 'ParentName' declared in enclosing function
'llvm::bolt::RewriteInstance::registerFragments'
2024-07-17 23:17:12 +04:00
Amir Ayupov
3fe50b6dde [BOLT] Store FileSymRefs in a multimap
With aggressive ICF, it's possible to have different local symbols
(under different FILE symbols) to be mapped to the same address.

FileSymRefs only keeps a single SymbolRef per address, which prevents
fragment matching from finding the correct symbol to perform parent
function lookup.

Work around this issue by switching FileSymRefs to a multimap. In
future, uses of FileSymRefs can be replaced with SortedSymbols which
keeps essentially the same information.

Test Plan: added ambiguous_fragment.test

Reviewers: dcci, ayermolo, maksfb, rafaelauler

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/98992
2024-07-16 22:14:43 -07:00
Sayhaan Siddiqui
e140a8a3c8 [BOLT][DWARF][NFC] Refactor address writers (#98094)
Refactors address writers to create an instance for each CU and its DWO
CU.
2024-07-15 23:03:43 -07:00
Sayhaan Siddiqui
7e10ad99ad [BOLT][DWARF] Cleanup buffer initialization for DWO range writer (#97843)
Cleanup buffer initialization for DWO range writer instances to remove
empty buffer at the beginning.
2024-07-10 11:35:40 -07:00
Sayhaan Siddiqui
a972b2e9a4 [BOLT][DWARF][NFC] Cleanup RangesBase check (#97840)
Moves check for RangesBase under check for UnitDie. This makes the flow
clearer because we add RangesBase when it is a UnitDie.
2024-07-10 10:53:08 -07:00
Sayhaan Siddiqui
d283627c4a [BOLT][DWARF][NFC] Update Die to not use std::optional (#97844)
Updates initialization to remove unnecessary use of std::optional.
2024-07-09 16:37:09 -07:00
Sayhaan Siddiqui
a40daa34ef [BOLT][DWARF][NFC] Cleanup version check (#97839)
Cleans up version check to remove redundant else branch.
2024-07-09 16:36:26 -07:00
Sayhaan Siddiqui
5828b04b03 [BOLT][DWARF] Refactor legacy ranges writers (#96006)
Refactors legacy ranges writers to create a writer for each instance of
a DWO file.

We now write out everything into .debug_ranges after the all the DWO
files are processed. This also changes the order that ranges is written
out in, as before we wrote out while in the main CU processing loop and
we now iterate through the CU buckets created by partitionCUs, after the
main processing loop.
2024-07-03 14:50:40 -07:00
Amir Ayupov
344228ebf4 [BOLT] Drop macro-fusion alignment (#97358)
9d0754ada5 dropped MC support required for
optimal macro-fusion alignment in BOLT. Remove the support in BOLT as
performance measurements with large binaries didn't show a significant
improvement.

Test Plan:
macro-fusion alignment was never upstreamed, so no upstream tests are
affected.
2024-07-02 09:20:41 -07:00
Fangrui Song
e3e0df391c [BOLT] Replace the MCAsmLayout parameter with MCAssembler
Continue the MCAsmLayout removal work started by 67957a45ee.
2024-07-01 18:02:34 -07:00
Fangrui Song
dbf12b2f77 [MC] Remove MCAsmLayout::{getSymbolOffset,getBaseSymbol}
The MCAsmLayout::* forwarders added by
67957a45ee have all been removed.
2024-07-01 11:51:26 -07:00
Shaw Young
49fdbbcfed [BOLT] Match functions with exact hash (#96572)
Added flag '--match-profile-with-function-hash' to match functions 
based on exact hash. After identical and LTO name matching, more 
functions can be recovered for inference with exact hash, in the case
of function renaming with no functional changes. Collisions are 
possible in the unlikely case where multiple functions share the same
exact hash. The flag is off by default as it requires the processing of 
all binary functions and subsequently is expensive.

Test Plan: added hashing-based-function-matching.test.
2024-06-29 21:19:00 -07:00
Maksim Panchenko
d16b21b17d [BOLT][Linux] Support ORC for alternative instructions (#96709)
Alternative instruction sequences in the Linux kernel can modify the
stack and thus they need their own ORC unwind entries. Since there's
only one ORC table, it has to be "shared" among multiple instruction
sequences. The kernel achieves this by putting a restriction on
instruction boundaries. If ORC state changes at a given IP, only one of
the alternative sequences can have an instruction starting/ending at
this IP. Then, developers can insert NOPs to guarantee the above
requirement is met.

The most common use of ORC with alternatives is "pushf; pop %rax"
sequence used for paravirtualization. Note that newer kernel versions
no longer use .parainstructions; instead, they utilize alternatives for
the same purpose.

Before we implement a better support for alternatives, we can safely
skip ORC entries associated with them.

Fixes #87052.
2024-06-27 19:26:11 -07:00
shawbyoung
902952ae04 Revert "[𝘀𝗽𝗿] initial version"
This reverts commit bb5ab1ffe7.
2024-06-25 08:30:29 -07:00