Provide a mechanism to resolve call target information for calls from non-BAT
functions to BAT functions (`YAMLProfileWriter::convert`). Make it generic for
future use in BAT-to-BAT calls.
Test Plan: Updated bolt/test/X86/bolt-address-translation-yaml.test
Reviewers: ayermolo, maksfb, rafaelauler, dcci
Reviewed By: maksfb
Pull Request: https://github.com/llvm/llvm-project/pull/86219
Under normal circumstances, we terminate basic blocks on a trap
instruction. However, Linux kernel may resume execution after hitting a
trap (ud2 on x86). Thus, we introduce "--terminal-trap" option that will
specify if the trap instruction should terminate the control flow. The
option is on by default except for the Linux kernel mode when it's off.
Attach call counters to YAML profile, covering inter-function control
flow.
Depends on: https://github.com/llvm/llvm-project/pull/86218
Test Plan:
Updated bolt/test/X86/bolt-address-translation-yaml.test
Provide secondary entry points for `EntryDiscriminator` call info field
in YAML profile.
Increases BAT section size to:
- large binary: 39655300 bytes (1.03x the original),
- medium binary: 3834328 bytes (0.65x),
- small binary: 924 bytes (0.64x).
Depends on: https://github.com/llvm/llvm-project/pull/76911
Test Plan:
- Updated bolt-address-translation{,-yaml}.test
- Added openssl test: https://github.com/rafaelauler/bolt-tests/pull/30
Reviewers: dcci, rafaelauler, maksfb, ayermolo
Reviewed By: rafaelauler
Pull Request: https://github.com/llvm/llvm-project/pull/86218
Hide the implementations of `FuncHashes` and `BBHashMap` classes,
getting rid of `at` accessors that could throw an exception.
Test Plan: NFC
Reviewers: ayermolo, maksfb, dcci, rafaelauler
Reviewed By: rafaelauler
Pull Request: https://github.com/llvm/llvm-project/pull/86353
The DW_AT_abstract_origin can be a cross-cu reference as a by-product of
LTO. On IR level for absolute references an address is stored, vs a DIE
for relative references. Added a map to keep track of cross-cu
referenced DIEs to use when we add an Entry.
YAML profile reader checks the number of basic blocks in regular,
no-stale-matching mode. Add it to BAT.
This increases the size of BAT section to:
- large binary: 39583080 bytes (1.02x of the original),
- medium binary: 3816492 bytes (0.64x),
- small binary: 920 bytes (0.64x, no change due to alignment).
Test Plan: Updated bolt-address-translation-yaml.test
Reviewers: rafaelauler, ayermolo, maksfb, dcci
Reviewed By: rafaelauler
Pull Request: https://github.com/llvm/llvm-project/pull/86045
Add input basic block index to BAT metadata. This addresses the case
where some basic blocks are eliminated, and output index is not equal
to the input block index. These indices are used in non-stale-matching
mode.
Increases BAT section size to:
- large binary: 39521512 bytes (1.02x original),
- medium binary: 3799988 bytes (0.64x),
- small binary: 920 bytes (0.64x).
Test Plan:
Updated bolt-address-translation{,-yaml}.test
Pull Request: https://github.com/llvm/llvm-project/pull/86044
Relax assumptions that YAML output is not supported in BAT mode.
Set up basic infrastructure for emitting YAML for functions not covered
by BAT, such as from `.bolt.org.text` section (code identical to input binary
sans external refs), or non-rewritten functions in non-relocation mode (where
the function stays in the same section but BAT mapping is not emitted).
This diff only produces YAML profile for non-BAT functions (skipped,
non-simple). YAML profile for BAT functions is added in follow-up diffs:
- https://github.com/llvm/llvm-project/pull/76911 emits YAML profile with
internal control flow information only (branch profile),
- https://github.com/llvm/llvm-project/pull/76896 adds cross-function profile
(calls profile).
Test Plan: Added bolt/test/X86/bolt-address-translation-yaml.test
Reviewers: ayermolo, dcci, maksfb, rafaelauler
Reviewed By: rafaelauler
Pull Request: https://github.com/llvm/llvm-project/pull/76910
Runtime code modification used by static keys is the most ubiquitous
self-modifying feature of the Linux kernel. The idea is to to eliminate
the condition check and associated conditional jump on a hot path if
that condition (based on a boolean value of a static key) does not
change often. Whenever they condition changes, the kernel runtime
modifies all code paths associated with that key flipping the code
between nop and (unconditional) jump.
The function is to be used by YAML profile emission in BAT mode for
BinaryFunctions not covered by BAT tables (same as in original binary).
Test Plan: NFC
Reviewers: rafaelauler, ayermolo, dcci, maksfb
Reviewed By: dcci
Pull Request: https://github.com/llvm/llvm-project/pull/76909
Refactor MCPlusBuilder's create{Instruction}() functions that used to
return bool. We almost never check the return value as we rely on
llvm_unreachable() to detect unimplemented functionality. There were a
couple of cases that checked the return value, but they would hit the
unreachable condition first (at least in debug builds) before the return
value gets checked.
Add an external interface to register a branch in a function that is in
disassembled state. Allows to make custom modifications to the
disassembler. E.g., a pre-CFG pass can add an instruction and register a
branch that will later be used during the CFG construction.
To avoid accidentally setting the label twice for the same instruction,
which can lead to a "lost" label, introduce getOrSetInstLabel()
function. Rename existing functions to getInstLabel()/setInstLabel() to
make it explicit that they operate on instruction labels. Add an
assertion in setInstLabel() that the instruction did not have a prior
label set.
After [this
](846eb76761)
commit we noticed that the size of fdata file decreased a lot. That's
why the better and more precise way will be to skip basic blocks with
exclusive instructions only instead of the whole function
DWARF5 spec supports the .debug_names acceleration table. This is the
formalized version of combination of gdb-index/pubnames/types. Added
implementation of it to BOLT. It supports both monolothic and split
dwarf, with and without Type Units. It does not include parent indices.
This will be in followup PR. Unlike LLVM output this will put all the
CUs and TUs into one Module.
The change in #80950 exposed a memory leak in BinarySection. Let
BinarySection manage memory passed via updateContents() unless a valid
SectionID is set indicating that the contents are managed by JITLink.
Make core BOLT functionality more friendly to being used as a
library instead of in our standalone driver llvm-bolt. To
accomplish this, we augment BinaryContext with journaling streams
that are to be used by most BOLT code whenever something needs to
be logged to the screen. Users of the library can decide if logs
should be printed to a file, no file or to the screen, as
before. To illustrate this, this patch adds a new option
`--log-file` that allows the user to redirect BOLT logging to a
file on disk or completely hide it by using
`--log-file=/dev/null`. Future BOLT code should now use
`BinaryContext::outs()` for printing important messages instead of
`llvm::outs()`. A new test log.test enforces this by verifying that
no strings are print to screen once the `--log-file` option is
used.
In previous patches we also added a new BOLTError class to report
common and fatal errors, so code shouldn't call exit(1) now. To
easily handle problems as before (by quitting with exit(1)),
callers can now use
`BinaryContext::logBOLTErrorsAndQuitOnFatal(Error)` whenever code
needs to deal with BOLT errors. To test this, we have fatal.s
that checks we are correctly quitting and printing a fatal error
to the screen.
Because this is a significant change by itself, not all code was
yet ported. Code from Profiler libs (DataAggregator and friends)
still print errors directly to screen.
Co-authored-by: Rafael Auler <rafaelauler@fb.com>
Test Plan: NFC
As part of the effort to refactor old error handling code that
would directly call exit(1), in this patch continue the migration
on libCore, libRewrite and libPasses to use the new BOLTError
class whenever a failure occurs.
Test Plan: NFC
Co-authored-by: Rafael Auler <rafaelauler@fb.com>
As part of the effort to refactor old error handling code that
would directly call exit(1), in this patch we add a new class
BOLTError and auxiliary functions `createFatalBOLTError()` and
`createNonFatalBOLTError()` that allow BOLT code to bubble up the
problem to the caller by using the Error class as a return
type (or Expected). Also changes passes to use these.
Co-authored-by: Rafael Auler <rafaelauler@fb.com>
Test Plan: NFC
As part of the effort to refactor old error handling code that
would directly call exit(1), in this patch we change the
interface to `BinaryFunctionPass` to return an Error on
`runOnFunctions()`. This gives passes the ability to report a
serious problem to the caller (RewriteInstance class), so the
caller may decide how to best handle the exceptional situation.
Co-authored-by: Rafael Auler <rafaelauler@fb.com>
Test Plan: NFC
Check if program header addresses fall into the kernel space to detect a
Linux kernel binary on x86-64.
Delete opts::LinuxKernelMode and use BinaryContext::IsLinuxKernel
instead.
Some metadata needs to be updated/finalized before the binary context is
emitted into the binary. Add the interface and use it for Linux ORC
update invocation.
A new function sorting algorithm (cdsort) in LLVM is an optimized
version of BOLT's hfsort+. In order to avoid code duplication and
simplify maintenance, getting rid of hfsort+.
Perf-wise this is likely a neutral change, though differences on
individual benchmarks are possible, since the generated function layout
has changed. I tested cdsort vs hfsort+ on a number of open-source and
prod binaries built in different modes and record an average neutral
perf difference, perhaps with more "green" counters.
Encode BRANCHENTRY bits as bitmask for deduplicated entries.
Reduces BAT section size:
- large binary: to 11834216 bytes (0.31x original),
- medium binary: to 1565584 bytes (0.26x original),
- small binary: to 336 bytes (0.23x original).
Test Plan: Updated bolt/test/X86/bolt-address-translation.test
Make output function addresses be delta-encoded wrt last offset in the
previous function. This reduces the deltas in function start addresses.
Test Plan:
Reduces BAT section size to:
- large binary: 12218860 bytes (0.32x original),
- medium binary: 1606580 bytes (0.27x original),
- small binary: 404 bytes (0.28x original),
Reviewers: rafaelauler
Reviewed By: rafaelauler
Pull Request: https://github.com/llvm/llvm-project/pull/76904
Reduces BAT section size:
- large binary: to 12283500 bytes (0.32x original size),
- medium binary: to 1616020 bytes (0.27x original size),
- small binary: to 404 bytes (0.28x original size).
Test Plan: Updated bolt/test/X86/bolt-address-translation.test
There was an assumpiton that TUs and CUs share .debug_str_offsets
contribution. For ThinLTO builds it is not the case. Changed so that we
parse contributions for TUs also, and did some refactoring so that we
don't re-parse contributions that were not modified.
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.