Under normal circumstances, we terminate basic blocks on a trap
instruction. However, Linux kernel may resume execution after hitting a
trap (ud2 on x86). Thus, we introduce "--terminal-trap" option that will
specify if the trap instruction should terminate the control flow. The
option is on by default except for the Linux kernel mode when it's off.
Make them start with 1 instead of 0 (reserved for primary entry point).
Test Plan:
```
bin/llvm-lit -a tools/bolt/test/X86/yaml-secondary-entry-discriminator.s
```
Reviewers: rafaelauler, ayermolo, maksfb, dcci
Reviewed By: maksfb
Pull Request: https://github.com/llvm/llvm-project/pull/86848
The DW_AT_abstract_origin can be a cross-cu reference as a by-product of
LTO. On IR level for absolute references an address is stored, vs a DIE
for relative references. Added a map to keep track of cross-cu
referenced DIEs to use when we add an Entry.
Runtime code modification used by static keys is the most ubiquitous
self-modifying feature of the Linux kernel. The idea is to to eliminate
the condition check and associated conditional jump on a hot path if
that condition (based on a boolean value of a static key) does not
change often. Whenever they condition changes, the kernel runtime
modifies all code paths associated with that key flipping the code
between nop and (unconditional) jump.
According to the DWARF spec a DIE that has DW_AT_specification or
DW_AT_abstract_origin can be part of .debug_name if a DIE those
attribute points to has DW_AT_name or DW_AT_linkage_name.
The foreign TU list immediately follows the local TU list and they both
use the same index, so that if there are N local TU entries, the index
for the first foreign TU is N.
Changed so that the size of local TU is accounted for when setting
foreign TU index.
The base class llvm::ThreadPoolInterface will be renamed
llvm::ThreadPool in a subsequent commit.
This is a breaking change: clients who use to create a ThreadPool must
now create a DefaultThreadPool instead.
Add R_AARCH64_JUMP26 implementation for createRelocation, which
could significantly reduce the number of failed scan-refs cases if we
perform bolt on a selective range of functions.
Add an external interface to register a branch in a function that is in
disassembled state. Allows to make custom modifications to the
disassembler. E.g., a pre-CFG pass can add an instruction and register a
branch that will later be used during the CFG construction.
Move code that sorts TakenBranches right before the branches are used.
We can populate TakenBranches in pre-CFG post-processing and hence have
to postpone the sorting to a later point in the processing pipeline.
Will add such a pass later. For now it's NFC.
To avoid accidentally setting the label twice for the same instruction,
which can lead to a "lost" label, introduce getOrSetInstLabel()
function. Rename existing functions to getInstLabel()/setInstLabel() to
make it explicit that they operate on instruction labels. Add an
assertion in setInstLabel() that the instruction did not have a prior
label set.
DWARF5 spec supports the .debug_names acceleration table. This is the
formalized version of combination of gdb-index/pubnames/types. Added
implementation of it to BOLT. It supports both monolothic and split
dwarf, with and without Type Units. It does not include parent indices.
This will be in followup PR. Unlike LLVM output this will put all the
CUs and TUs into one Module.
The change in #80950 exposed a memory leak in BinarySection. Let
BinarySection manage memory passed via updateContents() unless a valid
SectionID is set indicating that the contents are managed by JITLink.
GCC can generate rangelists/loclists that are out of order. Fixed so
that we don't assert, and instead generate partially optimized list.
Through most code paths we do sort rnglists/loclists, but not for
loclist for a path where BOLT does not modify a function. Although it's
nice to have lists sorted, this implementation shouldn't rely on it.
This also fixes an issue if we partially capture a list we would write
out *end_of_list in helper function. So tools won't see the rest of the
addresses being written out.
Make core BOLT functionality more friendly to being used as a
library instead of in our standalone driver llvm-bolt. To
accomplish this, we augment BinaryContext with journaling streams
that are to be used by most BOLT code whenever something needs to
be logged to the screen. Users of the library can decide if logs
should be printed to a file, no file or to the screen, as
before. To illustrate this, this patch adds a new option
`--log-file` that allows the user to redirect BOLT logging to a
file on disk or completely hide it by using
`--log-file=/dev/null`. Future BOLT code should now use
`BinaryContext::outs()` for printing important messages instead of
`llvm::outs()`. A new test log.test enforces this by verifying that
no strings are print to screen once the `--log-file` option is
used.
In previous patches we also added a new BOLTError class to report
common and fatal errors, so code shouldn't call exit(1) now. To
easily handle problems as before (by quitting with exit(1)),
callers can now use
`BinaryContext::logBOLTErrorsAndQuitOnFatal(Error)` whenever code
needs to deal with BOLT errors. To test this, we have fatal.s
that checks we are correctly quitting and printing a fatal error
to the screen.
Because this is a significant change by itself, not all code was
yet ported. Code from Profiler libs (DataAggregator and friends)
still print errors directly to screen.
Co-authored-by: Rafael Auler <rafaelauler@fb.com>
Test Plan: NFC
As part of the effort to refactor old error handling code that
would directly call exit(1), in this patch continue the migration
on libCore, libRewrite and libPasses to use the new BOLTError
class whenever a failure occurs.
Test Plan: NFC
Co-authored-by: Rafael Auler <rafaelauler@fb.com>
As part of the effort to refactor old error handling code that
would directly call exit(1), in this patch we add a new class
BOLTError and auxiliary functions `createFatalBOLTError()` and
`createNonFatalBOLTError()` that allow BOLT code to bubble up the
problem to the caller by using the Error class as a return
type (or Expected). Also changes passes to use these.
Co-authored-by: Rafael Auler <rafaelauler@fb.com>
Test Plan: NFC
We can use BinarySection::updateContents() to change section contents.
However, if we also add relocations for new contents, then the original
data (i.e. not updated) is going to be used. Fix that. A follow-up diff
will use the update interface and will include a test case.
Added an --comp-dir-override option that overrides DW_AT_comp_dir in the
unit die. This allows for llvm-bolt to be invoked from any category and
still find .dwo files.
Fix the bug where merge-fdata unconditionally outputs boltedcollection
line, regardless of whether input files have it set.
Test Plan:
Added bolt/test/X86/merge-fdata-nobat-mode.test which fails without this
fix.
We would always allocate maximum amount for vector containing
DWARFUnitInfo. In real usecases what ends up hapenning is we allocate a
giant vector when processing one CU, or for thin-lto case multiple CUs.
This lead to a lot of memory overhead, and 2x BOLT processing slowdown
for at least one service built with monolithic DWARF.
For binaries built with LTO with clang all of CUs that have cross
references will share an abbrev table and will be processed in one
batch. Rest of CUs are processesd in --cu-processing-batch-size size.
Which defaults to 1.
For theoretical cases where cross-cu references are present, but they do
not share abbrev will increase the size of CloneUnitCtxMap as each CU is
being processsed.
There was an assumpiton that TUs and CUs share .debug_str_offsets
contribution. For ThinLTO builds it is not the case. Changed so that we
parse contributions for TUs also, and did some refactoring so that we
don't re-parse contributions that were not modified.
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
This patch fixes:
bolt/lib/Core/BinaryFunctionProfile.cpp:222:10: error: variable
'BBMergeSI' set but not used [-Werror,-Wunused-but-set-variable]
bolt/lib/Passes/VeneerElimination.cpp:67:12: error: variable
'VeneerCallers' set but not used [-Werror,-Wunused-but-set-variable]
Provide backwards compatibility for YAML profile that uses `std::hash`:
xxh3 hash is the default for newly produced profile (sets `std-hash:
false`),
whereas the profile that doesn't specify `std-hash` will be treated as
`std-hash: true`, preserving old behavior.
This diff defines and initializes auxiliary variables used by CDSplit
and implements two important helper functions. The first helper function
approximates the block level size increase if a function is hot-warm
split at a given split index (X86 specific). The second helper function
finds all calls in the form of X->Y or Y->X for each BF given function
order [... X ... BF ... Y ...]. These calls are referred to as "cover
calls". Their distance will decrease if BF's hot fragment size is
further reduced by hot-warm splitting. NFC.
This commit explicitly adds a warm code section, .text.warm, when
-split-functions -split-strategy=cdsplit is used. This replaces the
previous approach of using .text.cold.0 as warm and .text.cold.1 as cold
in 3-way function splitting. NFC.
Simplify code in fixBranches(). Mostly NFC, accept the x86-specific
check for code fragments now takes into account presence of more than
two fragments. Should only matter when we split code into multiple
fragments and can run fixBranches() more than once.
Also, don't replace a branch target with the same one, as such operation
may allocate memory for extra MCSymbolRefExpr.
Fixed handling of DWP as input. Before BOLT crashed. Now it will write
out
correct CU, and all the TUs. Potential future improvement is to scan all
the TUs
used in this CU, and only include those.
std::hash and ADT/Hashing::hash_value are non-deterministic functions
whose
results might vary across implementation/process/execution. Using xxh3
instead
for computing hashes of BinaryFunctions and BinaryBasicBlock for stale
profile
matching.
(A possible alternative is to use ADT/StableHashing.h based on FNV
hashing but
xxh3 seems to be more popular in LLVM)
This is to address https://github.com/llvm/llvm-project/issues/65241.
This is a follow-up to #73076. We need to reset output addresses for
deleted blocks, otherwise the address translation may mistakenly
attribute input address of a deleted block to a non-zero address.
While working on a test case, I've discovered that DWARF output ranges
were already broken for deleted basic blocks: #73428. I will provide a
test case for this PR with a DWARF address range fix PR.
This commit modifies BinaryContext::calculateEmittedSize() to update
the BinaryBasicBlock::OutputAddressRange of each basic block in the
function in place. BinaryBasicBlock::getOutputSize() now gives the
emitted size of the basic block.
Whenever LPStartEncoding was different from DW_EH_PE_omit, we used to
miscalculate LPStart. As a result, landing pads were assigned wrong
addresses. Fix that.
Now PIE is default supported after clang 14. It cause parsing error when
using perf2bolt. The reason is the base address can not get correctly.
Fix the method of geting base address. If SegInfo.Alignment is not equal
to pagesize, alignDown(SegInfo.FileOffset, SegInfo.Alignment) can not
equal to FileOffset. So the SegInfo.FileOffset and FileOffset should be
aligned by SegInfo.Alignment first and then judge whether they are
equal.
The .text segment's offset from base address in VAS is aligned by
pagesize. So MMapAddress's offset from base address is
alignDown(SegInfo.Address, pagesize) instead of
alignDown(SegInfo.Address, SegInfo.Alignment). So the base address
calculate way should be changed.
Co-authored-by: Li Zhuohang <lizhuohang3@huawei.com>