Under normal circumstances, we terminate basic blocks on a trap
instruction. However, Linux kernel may resume execution after hitting a
trap (ud2 on x86). Thus, we introduce "--terminal-trap" option that will
specify if the trap instruction should terminate the control flow. The
option is on by default except for the Linux kernel mode when it's off.
Update instruction locations in the __bug_table section after new code
is emitted. If an instruction with associated bug ID was deleted,
overwrite its location with zero.
For DWARF5 BOLT was not retreiving address and instead was setting an
index.
Changed so that an address is used, and added DWARF4 test because it was
missing.
Relax assumptions that YAML output is not supported in BAT mode.
Set up basic infrastructure for emitting YAML for functions not covered
by BAT, such as from `.bolt.org.text` section (code identical to input binary
sans external refs), or non-rewritten functions in non-relocation mode (where
the function stays in the same section but BAT mapping is not emitted).
This diff only produces YAML profile for non-BAT functions (skipped,
non-simple). YAML profile for BAT functions is added in follow-up diffs:
- https://github.com/llvm/llvm-project/pull/76911 emits YAML profile with
internal control flow information only (branch profile),
- https://github.com/llvm/llvm-project/pull/76896 adds cross-function profile
(calls profile).
Test Plan: Added bolt/test/X86/bolt-address-translation-yaml.test
Reviewers: ayermolo, dcci, maksfb, rafaelauler
Reviewed By: rafaelauler
Pull Request: https://github.com/llvm/llvm-project/pull/76910
Runtime code modification used by static keys is the most ubiquitous
self-modifying feature of the Linux kernel. The idea is to to eliminate
the condition check and associated conditional jump on a hot path if
that condition (based on a boolean value of a static key) does not
change often. Whenever they condition changes, the kernel runtime
modifies all code paths associated with that key flipping the code
between nop and (unconditional) jump.
commit 43a2ec483f
Author: Jonas Devlieghere <jonas@devlieghere.com>
Date: Tue Mar 19 08:30:47 2024 -0700
removed parameter Translator from the constructor of DwarfStreamer.
This patch fixes the build by updating the constructor of DIEStreamer
accordingly.
.pci_fixup section contains a table with entries allowing to invoke a
fixup hook whenever a problem is encountered with a PCI device. The
hookup code typically points to the start of a function. As we are not
relocating functions in the kernel (at least not yet), verify this
assumption while reading the table and ignore any functions with a fixup
code in the middle.
Read .altinstructions and annotate instructions that have alternative
sequences with "AltInst" annotation. Note that some instructions may
have more than one alternatives, in which case they will have multiple
annotations in the form "AltInst", "AltInst2", "AltInst3", etc.
Read Linux exception table and ignore functions with exceptions for now.
Proper support requires an introduction of new control flow since some
instructions with memory access can cause a control flow change.
Hence looking at disassembly or CFG with exceptions annotations is
valuable for code analysis, delay marking functions with exceptions as
non-simple until immediately before emitting the code.
To avoid accidentally setting the label twice for the same instruction,
which can lead to a "lost" label, introduce getOrSetInstLabel()
function. Rename existing functions to getInstLabel()/setInstLabel() to
make it explicit that they operate on instruction labels. Add an
assertion in setInstLabel() that the instruction did not have a prior
label set.
DWARF5 spec supports the .debug_names acceleration table. This is the
formalized version of combination of gdb-index/pubnames/types. Added
implementation of it to BOLT. It supports both monolothic and split
dwarf, with and without Type Units. It does not include parent indices.
This will be in followup PR. Unlike LLVM output this will put all the
CUs and TUs into one Module.
The change in #80950 exposed a memory leak in BinarySection. Let
BinarySection manage memory passed via updateContents() unless a valid
SectionID is set indicating that the contents are managed by JITLink.
Static calls are calls that are getting patched during runtime. Hence,
for every such call the kernel runtime needs the location of the call or
jmp instruction that will be patched. Instruction locations together
with a corresponding key are stored in the static call site table. As
BOLT rewrites these instructions it needs to update the table.
Update ORC information based on the new code layout and emit
corresponding ORC sections for the Linux kernel.
We rewrite ORC sections in place, which puts a limit on the size of new
section contents. Since ORC info changes for the new code layout and the
number of ORC entries can become larger, we free up space in the tables
by removing redundant ORC terminators. As a result, we effectively emit
fewer entries and have to add duplicate terminators at the end to match
the original section sizes. Ideally, we need to update ORC boundaries to
reflect the reduced size and optimize runtime lookup, but we will need
relocations for this, and the benefits will be marginal, if any.
Make core BOLT functionality more friendly to being used as a
library instead of in our standalone driver llvm-bolt. To
accomplish this, we augment BinaryContext with journaling streams
that are to be used by most BOLT code whenever something needs to
be logged to the screen. Users of the library can decide if logs
should be printed to a file, no file or to the screen, as
before. To illustrate this, this patch adds a new option
`--log-file` that allows the user to redirect BOLT logging to a
file on disk or completely hide it by using
`--log-file=/dev/null`. Future BOLT code should now use
`BinaryContext::outs()` for printing important messages instead of
`llvm::outs()`. A new test log.test enforces this by verifying that
no strings are print to screen once the `--log-file` option is
used.
In previous patches we also added a new BOLTError class to report
common and fatal errors, so code shouldn't call exit(1) now. To
easily handle problems as before (by quitting with exit(1)),
callers can now use
`BinaryContext::logBOLTErrorsAndQuitOnFatal(Error)` whenever code
needs to deal with BOLT errors. To test this, we have fatal.s
that checks we are correctly quitting and printing a fatal error
to the screen.
Because this is a significant change by itself, not all code was
yet ported. Code from Profiler libs (DataAggregator and friends)
still print errors directly to screen.
Co-authored-by: Rafael Auler <rafaelauler@fb.com>
Test Plan: NFC
As part of the effort to refactor old error handling code that
would directly call exit(1), in this patch continue the migration
on libCore, libRewrite and libPasses to use the new BOLTError
class whenever a failure occurs.
Test Plan: NFC
Co-authored-by: Rafael Auler <rafaelauler@fb.com>
As part of the effort to refactor old error handling code that
would directly call exit(1), in this patch we change the
interface to `BinaryFunctionPass` to return an Error on
`runOnFunctions()`. This gives passes the ability to report a
serious problem to the caller (RewriteInstance class), so the
caller may decide how to best handle the exceptional situation.
Co-authored-by: Rafael Auler <rafaelauler@fb.com>
Test Plan: NFC
We run CheckLargeFunctions pass in non-relocation mode to prevent the
emission of functions that later could not be written to the output due
to their large size. The main reason behind the pass is to prevent the
emission of metadata for such functions since this metadata becomes
incorrect if the function is left unmodified.
Currently, the pass is enabled in non-relocation mode only when debug
info output is also enabled. As we emit increasingly more kinds of
metadata, e.g. for the Linux kernel, it becomes more challenging to
track metadata that needs to be fixed. Hence, I'm enabling the pass to
always run in non-relocation mode.
Write modified Linux kernel binary to disk. The output is not supposed
to be functional at the moment, but it will allow for future patches to
test the output binary.
When we adjust section sizes while rewriting a binary, we should be
using section offsets and not addresses to determine if section overlap.
NFC for existing binaries.
Check if program header addresses fall into the kernel space to detect a
Linux kernel binary on x86-64.
Delete opts::LinuxKernelMode and use BinaryContext::IsLinuxKernel
instead.
Some metadata needs to be updated/finalized before the binary context is
emitted into the binary. Add the interface and use it for Linux ORC
update invocation.
Added an --comp-dir-override option that overrides DW_AT_comp_dir in the
unit die. This allows for llvm-bolt to be invoked from any category and
still find .dwo files.
It was noted that new DWARFLinker libraries do not follow naming
agreement -
https://github.com/llvm/llvm-project/pull/75925#issuecomment-1883301659
This patch rename libraries to match with the agreement.
Rename LLVMDWARFLinkerBase library into the LLVMDWARFLinker. Rename
LLVMDWARFLinker library into the LLVMDWARFLinkerClassic. Correct include
path according to the new directory structure.
This patch creates DWARFLinkerBase library, places DWARFLinker code into
DWARFLinker\Classic, places DWARFLinkerParallel into DWARFLinker\Parallel.
updates BOLT to use new library. This patch is NFC.
Adds a function to create a LinkGraph of absolute symbols, and a
callback in dynamic library search generators to enable using it to
expose its symbols to the platform/orc runtime. This allows e.g. using
__orc_rt_run_program to run a precompiled function that was found via
dlsym. Ideally we would use this in llvm-jitlink's own search generator,
but it will require more work to align with the Process/Platform
JITDylib split, so not handled here.
As part of this change we need to handle LinkGraphs that only have
absolute symbols.
There was an assumpiton that TUs and CUs share .debug_str_offsets
contribution. For ThinLTO builds it is not the case. Changed so that we
parse contributions for TUs also, and did some refactoring so that we
don't re-parse contributions that were not modified.