Commit Graph

442 Commits

Author SHA1 Message Date
Nikita Popov
e7244d8659 [BOLT][CMake] Don't export bolt libraries in LLVMExports.cmake (#121936)
Bolt makes use of add_llvm_library and as such ends up exporting its
libraries from LLVMExports.cmake, which is not correct.

Bolt doesn't have its own exports file, and I assume that there is no
desire to have one either -- Bolt libraries are not intended to be
consumed as a cmake module, right?

As such, this PR adds a NO_EXPORT option to simplify exclude these
libraries from the exports file.
2025-01-08 09:41:09 +01:00
Peter Waller
aa9cc721e5 Reapply "[BOLT] Add --pad-funcs-before=func:n (#117924)" (#121918)
- **Reapply "[BOLT] Add --pad-funcs-before=func:n (#117924)"**
- **[BOLT] Fix --pad-funcs{,-before} state misinteraction**

When --pad-funcs-before was introduced, it introduced a bug whereby the
first one to get parsed could influence the other.

Ensure that each has its own state and test that they don't interact in
this manner by testing how the `_subsequent` symbol moves when both
arguments are supplied with different padding values.

Fixed by having a function (and static state) for each of before/after.
2025-01-07 17:25:04 +00:00
Amir Ayupov
be21bd9bbf Revert "[BOLT] Add --pad-funcs-before=func:n (#117924)"
14dcf8214f introduced a subtle bug with
the static `FunctionPadding` map.

If either `opts::FunctionPadSpec` or `opts::FunctionPadBeforeSpec` are set,
the map is going to be populated with the respective spec in the first
invocation of `BinaryEmitter::emitFunction`. The subsequent invocations
will pick up the padding from the map irrespective of whether
`opts::FunctionPadSpec` or `opts::FunctionPadBeforeSpec` is passed as a
parameter.

This breaks an internal test, hence reverting the patch.
2025-01-06 12:57:43 -08:00
Franklin
6e8a1a45a7 [BOLT] Detect Linux kernel version if the binary is a Linux kernel (#119088)
This makes it easier to handle differences (e.g. of exception table
entry size) between versions of Linux kernel
2024-12-26 09:54:23 -08:00
Alexey Moksyakov
e11d49cbf5 [BOLT][AArch64] Adds tls relocations support (#117465)
Co-authored-by: yavtuk <yavtuk@ya.ru>
2024-12-20 15:54:36 +03:00
Kristof Beyls
4111841f88 [BOLT] Correctly print preferred disassembly for annotated instructions (#120564)
This patch makes sure that `BinaryContext::printInstruction` prints the
preferred disassembly. Preferred disassembly only gets printed when
there are no annotations on the MCInst. Therefore, this patch
temporarily removes the annotations before printing it.

A few examples of before and after on AArch64 instructions are as
follows:

```
  BEFORE                     AFTER
                             (preferred disassembly)

  ret   x30                  ret
  orr   x30, xzr, x0         mov   x30, x0
  hint  #29                  autiasp
  hint  #12                  autia1716
```

Clearly, the preferred disassembly is easier for developers to read, and
is the disassembly that tools should be printing.

This patch is motivated as part of future work on the
llvm-bolt-binary-analysis tool, making sure that the reports it prints
do use preferred disassembly.

This patch was cherry-picked from
https://github.com/kbeyls/llvm-project/tree/bolt-gadget-scanner-prototype.

In this current patch, this only affects existing RISCV test cases.

This patch also does improve test cases in future patches that will
introduce a binary analysis for llvm-bolt-binary-analysis that checks
for correct application of pac-ret (pointer authentication on return
addresses).
2024-12-20 08:54:07 +00:00
Alexander Yermolovich
3c357a49d6 [BOLT] Add support for safe-icf (#116275)
Identical Code Folding (ICF) folds functions that are identical into one
function, and updates symbol addresses to the new address. This reduces
the size of a binary, but can lead to problems. For example when
function pointers are compared. This can be done either explicitly in
the code or generated IR by optimization passes like Indirect Call
Promotion (ICP). After ICF what used to be two different addresses
become the same address. This can lead to a different code path being
taken.

This is where safe ICF comes in. Linker (LLD) does it using address
significant section generated by clang. If symbol is in it, or an object
doesn't have this section symbols are not folded.

BOLT does not have the information regarding which objects do not have
this section, so can't re-use this mechanism.

This implementation scans code section and conservatively marks
functions symbols as unsafe. It treats symbols as unsafe if they are
used in non-control flow instruction. It also scans through the data
relocation sections and does the same for relocations that reference a
function symbol. The latter handles the case when function pointer is
stored in a local or global variable, etc. If a relocation address
points within a vtable these symbols are skipped.
2024-12-16 21:49:53 -08:00
Alexander Yermolovich
331c2dd8b4 [BOLT][DWARF] Add support for DW_OP_GNU_push_tls_address to .debug_names (#119939)
Added support to BOLT for DW_OP_GNU_push_tls_address. So now
DW_TAG_variable with this OP in DW_AT_location will appear in debug
names acceleration table. Although not in the DWARF 5 spec it is similar
to DW_OP_form_tls_address. Without this support llvm-dwarfdump --verify
--debug-names will report errors.
2024-12-14 09:30:25 -08:00
Maksim Panchenko
b560b87ba1 [BOLT] Clean up jump table handling in non-reloc mode. NFCI (#119614)
This change affects non-relocation mode only. Prior to having
CheckLargeFunctions pass, we could have emitted code for functions that
was discarded at the end due to size limitations. Since we didn't know
at the time of emission if the code would be discarded or not, we had to
emit jump tables in separate sections and handle them separately.
However, now we always run CheckLargeFunctions and make sure all emitted
code is used. Thus, we can get rid of the special jump table handling.
2024-12-13 13:14:02 -08:00
Alexander Yermolovich
4b825c7417 [BOLT][DWARF] Add support for transitive DW_AT_name/DW_AT_linkage_name resolution for DW_AT_name/DW_AT_linkage_name. (#119493)
This fix handles a case where a DIE that does not have
DW_AT_name/DW_AT_linkage_name, but has a reference to another DIE using
DW_AT_abstract_origin/DW_AT_specification. It also fixes a bug where
there are cross CU references for those attributes. Previously it would
use a DWARF Unit of a DIE which was being processed The
warf5-debug-names-cross-cu.s test just happened to work because how it
was constructed where string section was shared by both DWARF Units.

To resolve DW_AT_name/DW_AT_linkage_name this patch iterates over
references until it either reaches the final DIE or finds both of those
names.
2024-12-11 14:27:56 -08:00
Peter Waller
14dcf8214f [BOLT] Add --pad-funcs-before=func:n (#117924)
This complements --pad-funcs, and by using both simultaneously, enables
moving a specific function through the address space without modifying
any code
other than the targeted function (and references to it) by doing
(before+after=constant).

See also: proposed functionality to enable inserting random padding in

https://discourse.llvm.org/t/rfc-lld-feature-for-controlling-for-code-size-dependent-measurement-bias
and https://github.com/llvm/llvm-project/pull/117653
2024-12-11 09:58:52 +00:00
Alexander Yermolovich
50c0e679b9 [BOLT][DWARF] Add support for DW_TAG_union_type to DebugNames. (#119023)
Adding support for DW_TAG_union_type for DebugNames acceleration tables.
2024-12-06 15:45:52 -08:00
Jared Wyles
2ccf7ed277 [JITLink] Switch to SymbolStringPtr for Symbol names (#115796)
Use SymbolStringPtr for Symbol names in LinkGraph. This reduces string interning
on the boundary between JITLink and ORC, and allows pointer comparisons (rather
than string comparisons) between Symbol names. This should improve the
performance and readability of code that bridges between JITLink and ORC (e.g.
ObjectLinkingLayer and ObjectLinkingLayer::Plugins).

To enable use of SymbolStringPtr a std::shared_ptr<SymbolStringPool> is added to
LinkGraph and threaded through to its construction sites in LLVM and Bolt. All
LinkGraphs that are to have symbol names compared by pointer equality must point
to the same SymbolStringPool instance, which in ORC sessions should be the pool
attached to the ExecutionSession.
---------

Co-authored-by: Lang Hames <lhames@gmail.com>
2024-12-06 10:22:09 +11:00
Enna1
4d2bc0adc6 [BOLT] Extract comparator for sorting functions by index into helper function (#116217)
This change extracts the comparator for sorting functions by index into
a helper function `compareBinaryFunctionByIndex()`

Not sure why the comparator used in
`BinaryContext::getSortedFunctions()` is not same as the other two
places. I think they should use the same comparator, so I also change
`BinaryContext::getSortedFunctions()` to use
`compareBinaryFunctionByIndex()` for sorting functions.
2024-11-27 09:01:12 +08:00
Maksim Panchenko
92301180f7 [BOLT] Use compact EH format for fixed-address executables (#117274)
Use ULEB128 format for emitting LSDAs for fixed-address executables,
similar to what we use for PIEs/DSOs. Main difference is that we don't
use landing pad trampolines when landing pads are not contained in a
single fragment. Instead, we fallback to emitting larger fixed-address
LSDAs, which is still better than adding trampoline instructions.
2024-11-22 00:28:55 -08:00
Maksim Panchenko
105ecd8bb2 [BOLT] Avoid EH trampolines for PIEs/DSOs (#117106)
We used to emit EH trampolines for PIE/DSO whenever a function fragment
contained a landing pad outside of it. However, it is common to have all
landing pads in a cold fragment even when their throwers are in a hot
one.

To reduce the number of trampolines, analyze landing pads for any given
function fragment, and if they all belong to the same (possibly
different) fragment, designate that fragment as a landing pad fragment
for the "thrower" fragment. Later, emit landing pad fragment symbol as
an LPStart for the thrower LSDA.
2024-11-21 18:18:30 -08:00
Maksim Panchenko
3282be1f8d [BOLT] Use ULEB128 encoding for PIE/DSO exception tables (#116911)
Use ULEB128 encoding for call sites in PIE/DSO binaries. The encoding
reduces the size of the tables compared to sdata4 and is the default
format used by Clang.

Note that for fixed-address executables we still use absolute addressing
to cover cases where landing pads can reside in different function
fragments.

For testing, we rely on runtime EH tests.
2024-11-20 12:29:23 -08:00
Maksim Panchenko
066dd91ad8 [BOLT] Offset LPStart to avoid unnecessary instructions (#116713)
For C++ exception handling, when we write a call site table, we must
avoid emitting 0-value offsets for landing pads unless the call site has
no landing pad. However, 0 can be a real offset from the start of the
FDE if the FDE corresponds to a function fragment that starts with a
landing pad. In such cases, we used to emit a trap instruction at the
start of the fragment to guarantee non-zero LP offset.

To avoid emitting unnecessary trap instructions, we can instead set
LPStart to an offset from the FDE. If we emit it as [FDEStart - 1], then
all real offsets from LPStart in FDE become non-negative.
2024-11-19 16:45:03 -08:00
Maksim Panchenko
996553228f [BOLT] Overwrite .eh_frame and .gcc_except_table (#116755)
Under --use-old-text or --strict, we completely rewrite contents of EH
frames and exception tables sections. If new contents of either section
do not exceed the size of the original section, rewrite the section
in-place.
2024-11-19 12:59:05 -08:00
Maksim Panchenko
93a4244523 [BOLT] Use new assembler directives for EH table emission (#116294)
When emitting C++ exception tables (LSDAs), BOLT used to estimate the
size of the tables beforehand. This implementation was necessary as the
assembler/streamer lacked the emitULEB128IntValue() functionality.

As I plan to introduce [u|s]uleb128-encoded exception tables in BOLT,
now is a perfect time to switch to the new API and eliminate the need
to pre-compute the size of the tables.
2024-11-17 12:40:07 -08:00
Maksim Panchenko
1b8e0cf090 [BOLT] Never emit "large" functions (#115974)
"Large" functions are functions that are too big to fit into their
original slots after code modifications. CheckLargeFunctions pass is
designed to prevent such functions from emission. Extend this pass to
work with functions with constant islands.

Now that CheckLargeFunctions covers all functions, it guarantees that we
will never see such functions after code emission on all platforms
(previously it was guaranteed on x86 only). Hence, we can get rid of
RewriteInstance extensions that were meant to support "large" functions.
2024-11-13 09:58:44 -08:00
Maksim Panchenko
d922045381 [BOLT] Use AsmInfo for address size. NFCI (#115932)
Use AsmInfo instead of DWARFObj interface for extracting address size
and format.
2024-11-12 11:53:34 -08:00
Daniel Sanders
74003f11b3 [mc] Add CFI directive to emit val_offset() rules (#113971)
These specify that the value of the given register in the previous frame
is the CFA plus some offset. This isn't very common but can be necessary
if the original value is normally reconstructed from the stack/frame
pointer instead of being saved on the stack and reloaded from there.
2024-11-11 11:38:36 -08:00
Maksim Panchenko
49ee6069db [BOLT][AArch64] Add support for compact code model (#112110)
Add `--compact-code-model` option that executes alternative branch
relaxation with an assumption that the resulting binary has less than
128MB of code. The relaxation is done in `relaxLocalBranches()`, which
operates on a function level and executes on multiple functions in
parallel.

Running the new option on AArch64 Clang binary produces slightly smaller
code and the relaxation finishes in about 1/10th of the time.

Note that the new `.text` has to be smaller than 128MB, *and* `.plt` has
to be closer than 128MB to `.text`.
2024-11-07 14:51:12 -08:00
Sergei Barannikov
eeb987f6f3 [MC] Make generated MCInstPrinter::getMnemonic const (NFC) (#114682)
The value returned from the function depends only on the instruction opcode.

As a drive-by, change the type of the argument to const-reference.
2024-11-03 20:37:26 +03:00
Kazu Hirata
41baa69a7e [BOLT] Fix warnings (#114116)
This patch fixes:

  bolt/lib/Core/BinaryFunction.cpp:2537:13: error: enumeration value
  'OpNegateRAStateWithPC' not handled in switch [-Werror,-Wswitch]

  bolt/lib/Core/BinaryFunction.cpp:2661:13: error: enumeration value
  'OpNegateRAStateWithPC' not handled in switch [-Werror,-Wswitch]

  bolt/lib/Core/BinaryFunction.cpp:2805:13: error: enumeration value
  'OpNegateRAStateWithPC' not handled in switch [-Werror,-Wswitch]
2024-10-29 13:52:22 -07:00
Kazu Hirata
6803062eb7 [BOLT] Fix a build failure
This patch fixes:

  bolt/lib/Core/DIEBuilder.cpp:285:40: error: too many arguments to
  function call, expected 2, have 3
2024-10-22 10:20:20 -07:00
sinan
c3bbc3a57d [BOLT] Fix logs with no hex convension (#112650)
Add `utohexstr` to ensure that offsets/addresses are correctly formatted
as hexadecimal values.
2024-10-18 09:46:41 +08:00
Kazu Hirata
7928e14f5e [BOLT] Avoid repeated map lookups (NFC) (#112118) 2024-10-12 22:06:49 -07:00
Kazu Hirata
b192f208d6 [BOLT] Avoid repeated hash lookups (NFC) (#112073) 2024-10-12 08:03:39 -07:00
Maksim Panchenko
4db0cc4c55 [BOLT] Allow sections in --print-only flag (#109622)
While printing functions, expand --print-only flag to accept section
names. E.g., "--print-only=\.init" will only print functions from
".init" section.
2024-09-25 23:44:06 +02:00
Kristof Beyls
6d216fb7b8 [perf2bolt] Improve heuristic to map in-process addresses to specific… (#109397)
… segments in Elf binary.

The heuristic is improved by also taking into account that only
executable segments should contain instructions.

Fixes #109384.
2024-09-23 15:14:51 +02:00
Maksim Panchenko
abd69b3653 [BOLT] Handle internal calls in ValidateInternalCalls (#105736)
Move handling of all internal calls into the designated pass. Preserve
NOPs and mark functions as non-simple on non-X86 platforms.
2024-08-27 11:31:32 -07:00
ShatianWang
cbd302410e [BOLT] Improve BinaryFunction::inferFallThroughCounts() (#105450)
This PR improves how basic block execution count is updated when using
the BOLT option `-infer-fall-throughs`. Previously, if a 0-count
fall-through edge is assigned a positive inferred count N, then the
successor block's execution count will be incremented by N. Since the
successor's execution count is calculated using information besides
inflow sum (such as outflow sum), it likely is already correct, and
incrementing it by an additional N would be wrong. This PR improves how
the successor's execution count is updated by using the max over its
current count and N.
2024-08-21 00:35:07 -04:00
Maksim Panchenko
8f3050684e [BOLT] Reduce CFI warning verbosity (#105336)
CFI programs may have more saves than restores and this is completely
benign from BOLT's perspective. Reduce the verbosity and print the
warning only under `-v=1` and above.
2024-08-20 13:41:19 -07:00
Sayhaan Siddiqui
6aad62cf5b [BOLT][DWARF] Add parallelization for processing of DWO debug information (#100282)
Enables parallelization for the processing of DWO CUs.
2024-08-08 16:41:51 -07:00
Davide Italiano
e49549ff19 Revert "[BOLT] Abort on out-of-section symbols in GOT (#100801)"
This reverts commit a4900f0d93.
2024-08-07 20:52:19 -07:00
Sayhaan Siddiqui
62e894e0d7 [BOLT][DWARF][NFC] Move Arch assignment out of createBinaryContext (#102054)
Moves the assignment of Arch out of createBinaryContext to prevent data
races when parallelized.
2024-08-07 16:55:39 +00:00
Vladislav Khmelevsky
a4900f0d93 [BOLT] Abort on out-of-section symbols in GOT (#100801)
This patch aborts BOLT execution if it finds out-of-section (section
end) symbol in GOT table. In order to handle such situations properly in
future, we would need to have an arch-dependent way to analyze
relocations or its sequences, e.g., for ARM it would probably be ADRP +
LDR analysis in order to get GOT entry address. Currently, it is also
challenging because GOT-related relocation symbols are replaced to
__BOLT_got_zero. Anyway, it seems to be quite a rare case, which seems
to be only? related to static binaries. For the most part, it seems that
it should be handled on the linker stage, since static binary should not
have GOT table at all. LLD linker with relaxations enabled would replace
instruction addresses from GOT directly to target symbols, which
eliminates the problem.

Anyway, in order to achieve detection of such cases, this patch fixes a
few things in BOLT:
1. For the end symbols, we're now using the section provided by ELF
binary. Previously it would be tied with a wrong section found by symbol
address.
2. The end symbols would have limited registration we would only
add them in name->data GlobalSymbols map, since using address->data
BinaryDataMap map would likely be impossible due to address duality of
such symbols.
3. The outdated BD->getSection (currently returning refence, not
pointer) check in postProcessSymbolTable is replaced by getSize check in
order to allow zero-sized top-level symbols if they are located in
zero-sized sections. For the most part, such things could only be found
in tests, but I don't see a reason not to handle such cases.
4. Updated section-end-sym test and removed x86_64 requirement since
there is no reason for this (tested on aarch64 linux)

The test was provided by peterwaller-arm (thank you) in #100096 and
slightly modified by me.
2024-08-07 16:26:12 +04:00
Amir Ayupov
f83a89c1b1 [BOLT] Turn non-empty CFI StateStack assert into a warning (#102216)
clang-15 can produce binaries with mismatched RememberState/RestoreState
CFIs. This is benign for unwinding, so replace an assert with a warning.
2024-08-06 17:23:43 -07:00
Sayhaan Siddiqui
910012e7c5 [BOLT][DWARF][NFC] Split DIEBuilder::finish (#101244)
Split DIEBuilder::finish so that code updating .debug_names is in a
separate function.
2024-07-31 13:41:38 -07:00
Sayhaan Siddiqui
33960ce5a8 [BOLT][DWARF] Sort GDBIndexTUEntryVector (#101264)
Sorts GDBIndexTUEntryVector in decreasing order by hash to ensure
determinism when parallelized.
2024-07-31 11:35:38 -07:00
Sayhaan Siddiqui
9a3e66e314 [BOLT][DWARF][NFC] Fix DebugStrOffsetsWriter (#100672)
Fix DebugStrOffsetsWriter so updateAddressMap can't be called after it
is finalized.
2024-07-26 18:58:25 -07:00
Amir Ayupov
9d2dd009b6 [BOLT] Support more than two jump table parents
Multi-way splitting can cause multiple fragments to access the same jump
table. Relax the assumption that a jump table can only have up to two
parents.

Test Plan: added bolt/test/X86/three-way-split-jt.s

Reviewers: ayermolo, dcci, rafaelauler, maksfb

Reviewed By: rafaelauler, dcci

Pull Request: https://github.com/llvm/llvm-project/pull/99988
2024-07-24 07:16:39 -07:00
Amir Ayupov
83ea7ce3a1 [BOLT][NFC] Track fragment relationships using EquivalenceClasses
Three-way splitting can create references between split fragments (warm
to cold or vice versa) that are not handled by
`isChildOf/isParentOf/isChildOrParentOf`. Generalize fragment
relationships to allow checking if two functions belong to one group,
potentially in presence of ICF which can join multiple groups.

Test Plan: NFC for existing tests

Reviewers: maksfb, ayermolo, rafaelauler, dcci

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/99979
2024-07-24 07:15:10 -07:00
Jordan Brantner
d251a328b8 [BOLT] Fix typo from alterantive to alternative (#99704)
Fix typo from `alterantive` -> `alternative`

Signed-off-by: Jordan Brantner <brantnej@oregonstate.edu>
2024-07-22 18:35:20 -07:00
Fangrui Song
86e21e1af2 [BOLT] Remove unused bool arguments from createMCObjectStreamer callers 2024-07-20 21:30:49 -07:00
Amir Ayupov
3023b15fb1 [BOLT] Support POSSIBLE_PIC_FIXED_BRANCH
Detect and support fixed PIC indirect jumps of the following form:
```
movslq  En(%rip), %r1
leaq  PIC_JUMP_TABLE(%rip), %r2
addq  %r2, %r1
jmpq  *%r1
```

with PIC_JUMP_TABLE that looks like following:

```
  JT:  ----------
   E1:| L1 - JT  |
      |----------|
   E2:| L2 - JT  |
      |----------|
      |          |
         ......
   En:| Ln - JT  |
       ----------
```

The code could be produced by compilers, see
https://github.com/llvm/llvm-project/issues/91648.

Test Plan: updated jump-table-fixed-ref-pic.test

Reviewers: maksfb, ayermolo, dcci, rafaelauler

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/91667
2024-07-18 20:57:05 -07:00
Pavel Labath
09cbb45edd [BOLT][DWARF][NFC] A better DIEBuilder for the llvm API change in #98905 (#99324)
The caller (cloneAttribute) already switches on the reference type. By
aligning the cases with the retrieval functions, we can avoid branching
twice.
2024-07-18 09:46:29 +02:00
Pavel Labath
9dab91247d Fix bolt for #98905 2024-07-16 13:29:00 +02:00