Use SymbolStringPtr for Symbol names in LinkGraph. This reduces string interning
on the boundary between JITLink and ORC, and allows pointer comparisons (rather
than string comparisons) between Symbol names. This should improve the
performance and readability of code that bridges between JITLink and ORC (e.g.
ObjectLinkingLayer and ObjectLinkingLayer::Plugins).
To enable use of SymbolStringPtr a std::shared_ptr<SymbolStringPool> is added to
LinkGraph and threaded through to its construction sites in LLVM and Bolt. All
LinkGraphs that are to have symbol names compared by pointer equality must point
to the same SymbolStringPool instance, which in ORC sessions should be the pool
attached to the ExecutionSession.
---------
Co-authored-by: Lang Hames <lhames@gmail.com>
_init is used during startup of binaires. Unfortunately, its
address can be shared (at least on AArch64 glibc static binaries) with a
data
reference that lives in the GOT. The GOT rewriting is currently unable
to distinguish between data addresses and function addresses. This leads
to the data address being incorrectly rewritten, causing a crash on
startup of the binary:
Unexpected reloc type in static binary.
To avoid this, don't consider _init for being moved, by skipping it.
~We could add further conditions to narrow the skipped case for known
crashes, but as a straw man I thought it'd be best to keep the condition
as simple as possible and see if there any objections to this.~
(Edit: this broke the test
bolt/test/runtime/X86/retpoline-synthetic.test,
because _init was skipped from the retpoline pass and it has an indirect
call in it, so I include a check for static binaries now, which avoids
the test failure,
but perhaps this could/should be narrowed further?)
For now, skip _init for static binaries on any architecture; we could
add further conditions to narrow the skipped case for known crashes, but
as a straw man I thought it'd be best to keep the condition as simple as
possible and see if there any objections to this.
Updates #100096.
Under --use-old-text or --strict, we completely rewrite contents of EH
frames and exception tables sections. If new contents of either section
do not exceed the size of the original section, rewrite the section
in-place.
"Large" functions are functions that are too big to fit into their
original slots after code modifications. CheckLargeFunctions pass is
designed to prevent such functions from emission. Extend this pass to
work with functions with constant islands.
Now that CheckLargeFunctions covers all functions, it guarantees that we
will never see such functions after code emission on all platforms
(previously it was guaranteed on x86 only). Hence, we can get rid of
RewriteInstance extensions that were meant to support "large" functions.
AArch64 uses $d and $x symbols to delimit data embedded in code.
However, sometimes we see $d symbols, typically in .eh_frame, with
addresses that belong to different sections. These occasionally fall
inside .text functions and cause BOLT to stop disassembling, which in
turn causes DWARF CFA processing to fail.
As a workaround, we just ignore symbols with addresses outside the
section they belong to. This behaviour is consistent with objdump and
similar tools.
Don't call raw_string_ostream::flush(), which is essentially a no-op.
As specified in the docs, raw_string_ostream is always unbuffered.
( 65b13610a5 for further reference )
… segments in Elf binary.
The heuristic is improved by also taking into account that only
executable segments should contain instructions.
Fixes#109384.
This patch aborts BOLT execution if it finds out-of-section (section
end) symbol in GOT table. In order to handle such situations properly in
future, we would need to have an arch-dependent way to analyze
relocations or its sequences, e.g., for ARM it would probably be ADRP +
LDR analysis in order to get GOT entry address. Currently, it is also
challenging because GOT-related relocation symbols are replaced to
__BOLT_got_zero. Anyway, it seems to be quite a rare case, which seems
to be only? related to static binaries. For the most part, it seems that
it should be handled on the linker stage, since static binary should not
have GOT table at all. LLD linker with relaxations enabled would replace
instruction addresses from GOT directly to target symbols, which
eliminates the problem.
Anyway, in order to achieve detection of such cases, this patch fixes a
few things in BOLT:
1. For the end symbols, we're now using the section provided by ELF
binary. Previously it would be tied with a wrong section found by symbol
address.
2. The end symbols would have limited registration we would only
add them in name->data GlobalSymbols map, since using address->data
BinaryDataMap map would likely be impossible due to address duality of
such symbols.
3. The outdated BD->getSection (currently returning refence, not
pointer) check in postProcessSymbolTable is replaced by getSize check in
order to allow zero-sized top-level symbols if they are located in
zero-sized sections. For the most part, such things could only be found
in tests, but I don't see a reason not to handle such cases.
4. Updated section-end-sym test and removed x86_64 requirement since
there is no reason for this (tested on aarch64 linux)
The test was provided by peterwaller-arm (thank you) in #100096 and
slightly modified by me.
After porting BOLT to RISCV some of the relocations were broken on both
AArch64 and X86.
On AArch64 the example of broken relocations would be GOT, during
handling them, we should replace the symbol to __BOLT_got_zero in order
to address GOT entry, not the symbol that addresses this entry. This is
done further in code, so it is too early to add rel here.
On X86 it is a mistake to add relocations without addend. This is the
exact problem that is raised on #97937. Due to different code generation
I had to use gcc-generated yaml test, since with clang I wasn't able to
reproduce problem.
Added tests for both architectures and made the problematic condition
riscV-specific.
Take a common weak reference pattern for example
```
__attribute__((weak)) void undef_weak_fun();
if (&undef_weak_fun)
undef_weak_fun();
```
In this case, an undefined weak symbol `undef_weak_fun` has an address
of zero, and Bolt incorrectly changes the relocation for the
corresponding symbol to symbol@PLT, leading to incorrect runtime
behavior.
When BOLT is run in AggregateOnly mode (perf2bolt), it exits with code
zero so destructors are not run thus TimerGroup never prints the timers.
Add explicit printing just before the exit to honor options requesting
timers (`--time-rewrite`, `--time-aggr`).
Test Plan: updated bolt/test/timers.c
Reviewers: ayermolo, maksfb, rafaelauler, dcci
Reviewed By: dcci
Pull Request: https://github.com/llvm/llvm-project/pull/101270
Add a BinaryFunction field for pseudo probe function GUID.
Populate it during pseudo probe section parsing, and emit it in YAML
profile (both regular and BAT), along with function checksum.
To be used for stale function matching.
Test Plan: update pseudoprobe-decoding-inline.test
On clang 14 the build is failing with:
reference to local binding 'ParentName' declared in enclosing function
'llvm::bolt::RewriteInstance::registerFragments'
With aggressive ICF, it's possible to have different local symbols
(under different FILE symbols) to be mapped to the same address.
FileSymRefs only keeps a single SymbolRef per address, which prevents
fragment matching from finding the correct symbol to perform parent
function lookup.
Work around this issue by switching FileSymRefs to a multimap. In
future, uses of FileSymRefs can be replaced with SortedSymbols which
keeps essentially the same information.
Test Plan: added ambiguous_fragment.test
Reviewers: dcci, ayermolo, maksfb, rafaelauler
Reviewed By: rafaelauler
Pull Request: https://github.com/llvm/llvm-project/pull/98992
9d0754ada5 dropped MC support required for
optimal macro-fusion alignment in BOLT. Remove the support in BOLT as
performance measurements with large binaries didn't show a significant
improvement.
Test Plan:
macro-fusion alignment was never upstreamed, so no upstream tests are
affected.
Added flag '--match-profile-with-function-hash' to match functions
based on exact hash. After identical and LTO name matching, more
functions can be recovered for inference with exact hash, in the case
of function renaming with no functional changes. Collisions are
possible in the unlikely case where multiple functions share the same
exact hash. The flag is off by default as it requires the processing of
all binary functions and subsequently is expensive.
Test Plan: added hashing-based-function-matching.test.
Move functionality for patching build ID into a separate rewriter class
and change the way we do the patching. Support build ID in different
note sections in order to update the build ID in the Linux kernel binary
which puts in into ".notes" section instead of ".note.gnu.build-id".
CDSplit splits functions up to three ways: main fragment with no suffix,
and fragments with .cold and .warm suffixes.
Add .warm suffix to the regex used to recognize split fragments.
Test Plan: updated register-fragments-bolt-symbols.s
To align YAML and fdata profiles produced in BAT mode, lift two
restrictions applied in non-relocation mode when BAT is present:
1) register secondary entry points from ignored functions,
2) treat functions with secondary entry points as simple.
This allows constructing CFG for non-simple functions in non-relocation
mode and emitting YAML profile for them, which can then be used for
optimizations in relocation mode.
Test Plan: added test ignored-interprocedural-reference.s
Exempt special symbols (hot text/data and _end symbol) from normal
handling. We only need to set their value and make them absolute.
If these symbols are handled as normal symbols and if they alias
functions we may create non-sensical symbols, e.g. __hot_start.cold.
Test Plan: updated hot-end-symbol.s
Reviewers: maksfb, rafaelauler, ayermolo, dcci
Reviewed By: dcci, maksfb
Pull Request: https://github.com/llvm/llvm-project/pull/92713
Move code that checks for __bolt_reserved_{start,end} into a new
discoverBOLTReserved() function and call it from discoverFileObjects()
so that the reserved space info is accessible to passes. NFC for the
current set of binaries.
We use pwrite() in RewriteInstance to update contents of existing
sections. pwrite() requires file position to be set past the written
offset which we guarantee at the start of rewriteFile(). Then we had an
implicit assumption in patchBuildID() that the file position will be set
again in patchELFSymTabs() after being reset in patchELFPHDRTable().
That assumption was broken in #90300. The fix is to save and restore
file position in patchELFPHDRTable(). Then we don't have to update it
again in patchELFSymTabs().
Use known order of BOLT split function symbols: fragment symbols
immediately precede the parent fragment symbol.
Depends On: https://github.com/llvm/llvm-project/pull/89648
Test Plan: Added register-fragments-bolt-symbols.s
Allow the user to allocate space in a binary that could be used by BOLT
for allocating new sections. The reservation is specified by two special
symbols recognizable by BOLT: __bolt_reserved_{start,end}.
The reserved space will be useful for optimizing the Linux kernel where
we cannot allocate a new executable segment. However, the support is not
limited to kernel binaries as some user-space application may find it
useful too.
Fragment matching relies on symbol names to identify and register split
function fragments. However, as split fragments are often local symbols,
name aliasing is possible. For such cases, use symbol table to resolve
ambiguities.
This requires the presence of FILE symbols in the input binary. As BOLT
requires non-stripped binary, this is a reasonable assumption. Note that
`strip -g` removes FILE symbols by default, but `--keep-file-symbols`
can be used to preserve them.
Depends on: https://github.com/llvm/llvm-project/pull/89861
Test Plan:
Updated X86/fragment-lite.s
When we rewrite dynamic relocations, there could be cases where they
reference code locations inside functions that were rewritten. When this
happens, we need to precisely map old address to a new one. Until we can
reliably perform the mapping, detect such condition and issue an error
refusing to write a broken binary.