Commit Graph

503 Commits

Author SHA1 Message Date
Amir Ayupov
0227623915 [BOLT][BAT] Support multi-way split functions
BAT writeMaps encoded the assumption that functions are only split into
two fragments (hot and cold). However, BOLT supports splitting into
arbitrary number of fragments. Relax that assumption and look up primary
(hot) fragment explicitly.

Depends on: https://github.com/llvm/llvm-project/pull/86219

Test Plan: Updated bolt/test/X86/yaml-secondary-entry-discriminator.s

Reviewers: ayermolo, rafaelauler, maksfb, dcci

Reviewed By: maksfb, dcci

Pull Request: https://github.com/llvm/llvm-project/pull/87123
2024-04-05 17:50:50 -07:00
Amir Ayupov
2d3c827c05 [BOLT] Use BAT for YAML profile call target information
Provide a mechanism to resolve call target information for calls from non-BAT
functions to BAT functions (`YAMLProfileWriter::convert`). Make it generic for
future use in BAT-to-BAT calls.

Test Plan: Updated bolt/test/X86/bolt-address-translation-yaml.test

Reviewers: ayermolo, maksfb, rafaelauler, dcci

Reviewed By: maksfb

Pull Request: https://github.com/llvm/llvm-project/pull/86219
2024-04-05 16:08:59 -07:00
Maksim Panchenko
7de82ca369 [BOLT] Don't terminate on trap instruction for Linux kernel (#87021)
Under normal circumstances, we terminate basic blocks on a trap
instruction. However, Linux kernel may resume execution after hitting a
trap (ud2 on x86). Thus, we introduce "--terminal-trap" option that will
specify if the trap instruction should terminate the control flow. The
option is on by default except for the Linux kernel mode when it's off.
2024-03-29 16:41:15 -07:00
Maksim Panchenko
35e7d458c9 [BOLT] Add rewriting support for Linux kernel __bug_table (#86908)
Update instruction locations in the __bug_table section after new code
is emitted. If an instruction with associated bug ID was deleted,
overwrite its location with zero.
2024-03-28 10:30:27 -07:00
Amir Ayupov
385e3e26c1 [BOLT] Set EntryDiscriminator in YAML profile for indirect calls
Indirect call handling missed setting an `EntryDiscriminator` while it's
set for direct calls and tail calls.

Improve YAML profile accuracy by unifying the destination setting
between direct and indirect calls into `setCSIDestination` method.

Depends on: https://github.com/llvm/llvm-project/pull/86848

Test Plan: Updated bolt/test/X86/yaml-secondary-entry-discriminator.s

Reviewers: ayermolo, maksfb, rafaelauler

Reviewed By: maksfb

Pull Request: https://github.com/llvm/llvm-project/pull/82128
2024-03-27 16:40:38 -07:00
Amir Ayupov
d8fe2e4bb0 [BOLT] Fix enumeration of secondary entry points
Make them start with 1 instead of 0 (reserved for primary entry point).

Test Plan:
```
bin/llvm-lit -a tools/bolt/test/X86/yaml-secondary-entry-discriminator.s
```

Reviewers: rafaelauler, ayermolo, maksfb, dcci

Reviewed By: maksfb

Pull Request: https://github.com/llvm/llvm-project/pull/86848
2024-03-27 15:23:49 -07:00
Amir Ayupov
213eda157a [BOLT] Add CallSiteInfo entries in YAMLBAT (#76896)
Attach call counters to YAML profile, covering inter-function control
flow.

Depends on: https://github.com/llvm/llvm-project/pull/86218

Test Plan: 
Updated bolt/test/X86/bolt-address-translation-yaml.test
2024-03-25 16:23:21 -07:00
Amir Ayupov
1b763f230a [BOLT] Add secondary entry points to BAT
Provide secondary entry points for `EntryDiscriminator` call info field
in YAML profile.

Increases BAT section size to:
- large binary: 39655300 bytes (1.03x the original),
- medium binary: 3834328 bytes (0.65x),
- small binary: 924 bytes (0.64x).

Depends on: https://github.com/llvm/llvm-project/pull/76911

Test Plan:
- Updated bolt-address-translation{,-yaml}.test
- Added openssl test: https://github.com/rafaelauler/bolt-tests/pull/30

Reviewers: dcci, rafaelauler, maksfb, ayermolo

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/86218
2024-03-25 15:14:33 -07:00
Amir Ayupov
d7d2f7ca62 [BOLT] Emit intra-function control flow in YAMLBAT
Attach branch counters to YAML profile, covering intra-function control
flow.

Depends on: https://github.com/llvm/llvm-project/pull/86353

Test Plan: Updated bolt/test/X86/bolt-address-translation-yaml.test

Reviewers: rafaelauler, dcci, ayermolo, maksfb

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/76911
2024-03-23 19:11:49 -07:00
Maksim Panchenko
51268a57fd [BOLT] Enable --keep-nops option for Linux kernel by default (#86349)
Preserve nop instructions in the Linux kernel since they could be used
for runtime patching.
2024-03-22 15:29:26 -07:00
Alexander Yermolovich
f3cfe016c5 [BOLT][DWARF] Add support for cross-cu references for debug-names (#86015)
The DW_AT_abstract_origin can be a cross-cu reference as a by-product of
LTO. On IR level for absolute references an address is stored, vs a DIE
for relative references. Added a map to keep track of cross-cu
referenced DIEs to use when we add an Entry.
2024-03-22 13:48:49 -07:00
Alexander Yermolovich
105feb9ac6 [BOLT][DWARF] Fix handling of DW_TAG_label (#86182)
For DWARF5 BOLT was not retreiving address and instead was setting an
index.
Changed so that an address is used, and added DWARF4 test because it was
missing.
2024-03-22 13:41:27 -07:00
Amir Ayupov
ceba3a38e8 [BOLT] Add number of basic blocks to BAT
YAML profile reader checks the number of basic blocks in regular,
no-stale-matching mode. Add it to BAT.

This increases the size of BAT section to:
- large binary: 39583080 bytes (1.02x of the original),
- medium binary: 3816492 bytes (0.64x),
- small binary: 920 bytes (0.64x, no change due to alignment).

Test Plan: Updated bolt-address-translation-yaml.test

Reviewers: rafaelauler, ayermolo, maksfb, dcci

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/86045
2024-03-22 08:46:48 -07:00
Amir Ayupov
b0e23639c5 [BOLT] Add BB index to BAT
Add input basic block index to BAT metadata. This addresses the case
where some basic blocks are eliminated, and output index is not equal
to the input block index. These indices are used in non-stale-matching
mode.

Increases BAT section size to:
- large binary: 39521512 bytes (1.02x original),
- medium binary: 3799988 bytes (0.64x),
- small binary: 920 bytes (0.64x).

Test Plan:
Updated bolt-address-translation{,-yaml}.test

Pull Request: https://github.com/llvm/llvm-project/pull/86044
2024-03-22 08:42:58 -07:00
Amir Ayupov
f66d631bf8 Revert "[BOLT] Add BB index to BAT (#86044)"
This reverts commit 3b3de48fd8.
2024-03-22 08:38:40 -07:00
Amir Ayupov
3b3de48fd8 [BOLT] Add BB index to BAT (#86044) 2024-03-22 06:07:17 -07:00
Amir Ayupov
6280681137 [BOLT] Output basic YAML profile in BAT mode
Relax assumptions that YAML output is not supported in BAT mode.
Set up basic infrastructure for emitting YAML for functions not covered
by BAT, such as from `.bolt.org.text` section (code identical to input binary
sans external refs), or non-rewritten functions in non-relocation mode (where
the function stays in the same section but BAT mapping is not emitted).

This diff only produces YAML profile for non-BAT functions (skipped,
non-simple). YAML profile for BAT functions is added in follow-up diffs:
- https://github.com/llvm/llvm-project/pull/76911 emits YAML profile with
  internal control flow information only (branch profile),
- https://github.com/llvm/llvm-project/pull/76896 adds cross-function profile
  (calls profile).

Test Plan: Added bolt/test/X86/bolt-address-translation-yaml.test

Reviewers: ayermolo, dcci, maksfb, rafaelauler

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/76910
2024-03-21 14:32:13 -07:00
Maksim Panchenko
6b1cf00400 [BOLT] Add support for Linux kernel static keys jump table (#86090)
Runtime code modification used by static keys is the most ubiquitous
self-modifying feature of the Linux kernel. The idea is to to eliminate
the condition check and associated conditional jump on a hot path if
that condition (based on a boolean value of a static key) does not
change often. Whenever they condition changes, the kernel runtime
modifies all code paths associated with that key flipping the code
between nop and (unconditional) jump.
2024-03-21 14:05:21 -07:00
Alexander Yermolovich
3cd988914e [BOLT][DWARF] Fix Test (#86042)
Test was not actually checking bolt binary, and had extra POSTCHECK-NEXT
lines.
2024-03-20 18:11:32 -07:00
Amir Ayupov
ad00e7e5ed [BOLT] Write and parse BF/BB hashes in BAT
This increases BAT section size to:
- large binary: 34832976 bytes (0.90x original),
- medium binary: 3586800 bytes (0.60x original),
- small binary: 816 bytes (0.57x original).

Test Plan: Updated bolt/test/X86/bolt-address-translation.test

Reviewers: rafaelauler, dcci, ayermolo, maksfb

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/76907
2024-03-20 16:24:21 -07:00
Alexander Yermolovich
4841858862 [BOLT][DWARF] Add support to debug_names for DW_AT_abstract_origin/DW_AT_specification (#85485)
According to the DWARF spec a DIE that has DW_AT_specification or
DW_AT_abstract_origin can be part of .debug_name if a DIE those
attribute points to has DW_AT_name or DW_AT_linkage_name.
2024-03-18 15:28:01 -07:00
Alexander Yermolovich
a4610c7182 [BOLT][DWARF] Add support for DW_IDX_parent (#85285)
This adds support for DW_IDX_parent. If DIE has a parent then
DW_IDX_parent in Entry will point to Entry for that parent DIE.
Otherwise it will have DW_FORM_flag_present in abbrev. Which takes zero
space in Entry.

This came from

https://discourse.llvm.org/t/rfc-improve-dwarf-5-debug-names-type-lookup-parsing-speed/74151
2024-03-15 13:52:45 -07:00
Amir Ayupov
b431546d41 [BOLT] Check BF state in stale matching (#85339)
Only apply stale matching if the binary function is in CFG state, i.e.
has basic blocks.

Test Plan:
Updated bolt/test/X86/reader-stale-yaml.test
2024-03-15 10:55:53 -07:00
Maksim Panchenko
fd32e744a5 [BOLT] Add support for Linux kernel PCI fixup section (#84982)
.pci_fixup section contains a table with entries allowing to invoke a
fixup hook whenever a problem is encountered with a PCI device. The
hookup code typically points to the start of a function. As we are not
relocating functions in the kernel (at least not yet), verify this
assumption while reading the table and ignore any functions with a fixup
code in the middle.
2024-03-12 15:52:27 -07:00
Alexander Yermolovich
6d4aa9d70e [BOLT][DWWARF] Fix foreign TU index with local TUs (#84594)
The foreign TU list immediately follows the local TU list and they both
use the same index, so that if there are N local TU entries, the index
for the first foreign TU is N.

Changed so that the size of local TU is accounted for when setting
foreign TU index.
2024-03-11 12:20:25 -07:00
Maksim Panchenko
a9b0d7590b [BOLT] Properly propagate Cursor errors (#84378)
Handle out-of-bounds reading errors correctly in LinuxKernelRewriter.
2024-03-07 15:29:38 -08:00
Maksim Panchenko
143afb405a [BOLT] Add reading support for Linux kernel .altinstructions section (#84283)
Read .altinstructions and annotate instructions that have alternative
sequences with "AltInst" annotation. Note that some instructions may
have more than one alternatives, in which case they will have multiple
annotations in the form "AltInst", "AltInst2", "AltInst3", etc.
2024-03-07 13:04:02 -08:00
Maksim Panchenko
02629793a4 [BOLT] Add reading support for Linux kernel __bug_table section (#84082)
Read __bug_table section and annotate ud2 instructions with a
corresponding bug entry ID.
2024-03-06 23:34:03 -08:00
Fangrui Song
50bdc6f3ec [BOLT,test] Remove -relax-relocations
The option is always true (see
2aedfdd9b8).
2024-03-06 22:37:18 -08:00
Maksim Panchenko
f51ade25b9 [BOLT] Add reading support for Linux kernel .parainstructions section (#83965)
Read .parainstruction section and mark call instructions with ParaSite
annotations.
2024-03-05 13:57:55 -08:00
Maksim Panchenko
ccf0c8da1a [BOLT] Add reading support for Linux kernel exception table (#83100)
Read Linux exception table and ignore functions with exceptions for now.
Proper support requires an introduction of new control flow since some
instructions with memory access can cause a control flow change.

Hence looking at disassembly or CFG with exceptions annotations is
valuable for code analysis, delay marking functions with exceptions as
non-simple until immediately before emitting the code.
2024-03-04 17:24:16 -08:00
Maksim Panchenko
0e84e2748b [BOLT] Move test under X86 target. NFCI (#83202)
instrument-wrong-target.s test requires X86 host. Move it under
runtime/X86.
2024-02-27 15:38:31 -08:00
Elvina Yakubova
b98e6a5ced [BOLT][AArch64] Skip BBs only instead of functions (#81989)
After [this
](846eb76761)
commit we noticed that the size of fdata file decreased a lot. That's
why the better and more precise way will be to skip basic blocks with
exclusive instructions only instead of the whole function
2024-02-27 19:19:47 +03:00
Alexander Yermolovich
6de5fcc746 [BOLT][DWARF] Add support for .debug_names (#81062)
DWARF5 spec supports the .debug_names acceleration table. This is the
formalized version of combination of gdb-index/pubnames/types. Added
implementation of it to BOLT. It supports both monolothic and split
dwarf, with and without Type Units. It does not include parent indices.
This will be in followup PR. Unlike LLVM output this will put all the
CUs and TUs into one Module.
2024-02-26 14:00:31 -08:00
Alexander Yermolovich
841a4168ad [BOLT] Fix runtime/instrument-wrong-target.s test (#82858)
Test was failing when only X86 was specified for LLVM_TARGETS_TO_BUILD.
Changed so that it will now report unsupporeted.

For "X86;AArch64" it still passes.
For "X86" reports UNSUPPORTED: BOLT :: runtime/instrument-wrong-target.s
(1 of 1)
2024-02-26 13:43:39 -08:00
Maksim Panchenko
2646dccaa3 [BOLT] Add support for Linux kernel static calls table (#82072)
Static calls are calls that are getting patched during runtime. Hence,
for every such call the kernel runtime needs the location of the call or
jmp instruction that will be patched. Instruction locations together
with a corresponding key are stored in the static call site table. As
BOLT rewrites these instructions it needs to update the table.
2024-02-18 17:20:25 -08:00
Maksim Panchenko
5a29887145 [BOLT] Add writing support for Linux kernel ORC (#80950)
Update ORC information based on the new code layout and emit
corresponding ORC sections for the Linux kernel.

We rewrite ORC sections in place, which puts a limit on the size of new
section contents. Since ORC info changes for the new code layout and the
number of ORC entries can become larger, we free up space in the tables
by removing redundant ORC terminators. As a result, we effectively emit
fewer entries and have to add duplicate terminators at the end to match
the original section sizes. Ideally, we need to update ORC boundaries to
reflect the reduced size and optimize runtime lookup, but we will need
relocations for this, and the benefits will be marginal, if any.
2024-02-16 14:25:59 -08:00
Alexander Yermolovich
5ff8b30327 [BOLT][DWARF] Do not emit zero low_pc address arange (#81955)
According to DWARF spec zero entires indicate end of arange. Changed so
that BOLT does not emit zero low_pc arange.
2024-02-16 11:23:28 -08:00
Alexander Yermolovich
82ca752393 [BOLT][DWARF] Add test for DW_AT_ranges input without function output (#81794)
Added a test that relies on -fbasic-block-sections=all and --gc-sections
that exercises a code path that previously printed a warning.
2024-02-14 15:43:39 -08:00
Alexander Yermolovich
c9e8e91aca [BOLT][DWARF] Fix out of order rangelists/loclists (#81645)
GCC can generate rangelists/loclists that are out of order. Fixed so
that we don't assert, and instead generate partially optimized list.
Through most code paths we do sort rnglists/loclists, but not for
loclist for a path where BOLT does not modify a function. Although it's
nice to have lists sorted, this implementation shouldn't rely on it.
This also fixes an issue if we partially capture a list we would write
out *end_of_list in helper function. So tools won't see the rest of the
addresses being written out.
2024-02-14 11:23:57 -08:00
Amir Ayupov
52cf07116b [BOLT][NFC] Log through JournalingStreams (#81524)
Make core BOLT functionality more friendly to being used as a
library instead of in our standalone driver llvm-bolt. To
accomplish this, we augment BinaryContext with journaling streams
that are to be used by most BOLT code whenever something needs to
be logged to the screen. Users of the library can decide if logs
should be printed to a file, no file or to the screen, as
before. To illustrate this, this patch adds a new option
`--log-file` that allows the user to redirect BOLT logging to a
file on disk or completely hide it by using
`--log-file=/dev/null`. Future BOLT code should now use
`BinaryContext::outs()` for printing important messages instead of
`llvm::outs()`. A new test log.test enforces this by verifying that
no strings are print to screen once the `--log-file` option is
used.

In previous patches we also added a new BOLTError class to report
common and fatal errors, so code shouldn't call exit(1) now. To
easily handle problems as before (by quitting with exit(1)),
callers can now use
`BinaryContext::logBOLTErrorsAndQuitOnFatal(Error)` whenever code
needs to deal with BOLT errors. To test this, we have fatal.s
that checks we are correctly quitting and printing a fatal error
to the screen.

Because this is a significant change by itself, not all code was
yet ported. Code from Profiler libs (DataAggregator and friends)
still print errors directly to screen.

Co-authored-by: Rafael Auler <rafaelauler@fb.com>

Test Plan: NFC
2024-02-12 14:53:53 -08:00
Jon Roelofs
b98db441f0 [BOLT] Make ifunc test not statically-resolvable. NFC
This fixes a breakage caused by e976385415
2024-02-06 15:15:11 -08:00
Maksim Panchenko
a693ae5306 [BOLT] Enable re-writing of Linux kernel binary (#80228)
Write modified Linux kernel binary to disk. The output is not supposed
to be functional at the moment, but it will allow for future patches to
test the output binary.
2024-02-01 12:11:26 -08:00
Maksim Panchenko
116e801a15 [BOLT] Adjust section sizes based on file offsets (#80226)
When we adjust section sizes while rewriting a binary, we should be
using section offsets and not addresses to determine if section overlap.
NFC for existing binaries.
2024-02-01 12:08:41 -08:00
Maksim Panchenko
2abcbbd96a [BOLT] Detect Linux kernel based on ELF program headers (#80086)
Check if program header addresses fall into the kernel space to detect a
Linux kernel binary on x86-64.

Delete opts::LinuxKernelMode and use BinaryContext::IsLinuxKernel
instead.
2024-01-30 18:04:29 -08:00
Maksim Panchenko
0fc791cd2c [BOLT] Fix comparison function for Linux ORC entries (#79921)
Fix ORC entry comparison function to cover a case with multiple
terminator entries matching at the same IP.
2024-01-29 17:45:40 -08:00
Amir Ayupov
df7d2b2f90 [BOLT] Deduplicate equal offsets in BAT (#76905)
Encode BRANCHENTRY bits as bitmask for deduplicated entries.

Reduces BAT section size:
- large binary: to 11834216 bytes (0.31x original),
- medium binary: to 1565584 bytes (0.26x original),
- small binary: to 336 bytes (0.23x original).

Test Plan: Updated bolt/test/X86/bolt-address-translation.test
2024-01-25 15:37:47 -08:00
Alexander Yermolovich
7d272722fb [BOLT][DWARF] Add option to specify DW_AT_comp_dir (#79395)
Added an --comp-dir-override option that overrides DW_AT_comp_dir in the
unit die. This allows for llvm-bolt to be invoked from any category and
still find .dwo files.
2024-01-25 15:00:52 -08:00
Amir Ayupov
e9309b27d7 [BOLT] Report input staleness (#79496)
It's beneficial to have uniform reporting in both `infer-stale-profile`
on and off cases, primarily for logging purposes.

Without this change, BOLT would report "input" staleness in
`infer-stale-profile=0` case (without matching), and "output" staleness
in `infer-stale-profile=1` case (after matching).

This change makes BOLT report "input" staleness in both cases. "Output"
staleness information is printed separately with "BOLT-INFO: inferred
profile..."
2024-01-25 14:15:13 -08:00
Alexander Yermolovich
bb6a485055 [BOLT] Fix updating DW_AT_stmt_list for DWARF5 TUs (#79374)
Changed so that we also update DW_AT_stmt_list for DWARF5 TUs. BOLT was
doing it for DWARF4, but it wasn't doing it for DWARF5.
2024-01-24 15:34:29 -08:00