Bolt makes use of add_llvm_library and as such ends up exporting its
libraries from LLVMExports.cmake, which is not correct.
Bolt doesn't have its own exports file, and I assume that there is no
desire to have one either -- Bolt libraries are not intended to be
consumed as a cmake module, right?
As such, this PR adds a NO_EXPORT option to simplify exclude these
libraries from the exports file.
- **Reapply "[BOLT] Add --pad-funcs-before=func:n (#117924)"**
- **[BOLT] Fix --pad-funcs{,-before} state misinteraction**
When --pad-funcs-before was introduced, it introduced a bug whereby the
first one to get parsed could influence the other.
Ensure that each has its own state and test that they don't interact in
this manner by testing how the `_subsequent` symbol moves when both
arguments are supplied with different padding values.
Fixed by having a function (and static state) for each of before/after.
14dcf8214f introduced a subtle bug with
the static `FunctionPadding` map.
If either `opts::FunctionPadSpec` or `opts::FunctionPadBeforeSpec` are set,
the map is going to be populated with the respective spec in the first
invocation of `BinaryEmitter::emitFunction`. The subsequent invocations
will pick up the padding from the map irrespective of whether
`opts::FunctionPadSpec` or `opts::FunctionPadBeforeSpec` is passed as a
parameter.
This breaks an internal test, hence reverting the patch.
This patch makes sure that `BinaryContext::printInstruction` prints the
preferred disassembly. Preferred disassembly only gets printed when
there are no annotations on the MCInst. Therefore, this patch
temporarily removes the annotations before printing it.
A few examples of before and after on AArch64 instructions are as
follows:
```
BEFORE AFTER
(preferred disassembly)
ret x30 ret
orr x30, xzr, x0 mov x30, x0
hint #29 autiasp
hint #12 autia1716
```
Clearly, the preferred disassembly is easier for developers to read, and
is the disassembly that tools should be printing.
This patch is motivated as part of future work on the
llvm-bolt-binary-analysis tool, making sure that the reports it prints
do use preferred disassembly.
This patch was cherry-picked from
https://github.com/kbeyls/llvm-project/tree/bolt-gadget-scanner-prototype.
In this current patch, this only affects existing RISCV test cases.
This patch also does improve test cases in future patches that will
introduce a binary analysis for llvm-bolt-binary-analysis that checks
for correct application of pac-ret (pointer authentication on return
addresses).
Identical Code Folding (ICF) folds functions that are identical into one
function, and updates symbol addresses to the new address. This reduces
the size of a binary, but can lead to problems. For example when
function pointers are compared. This can be done either explicitly in
the code or generated IR by optimization passes like Indirect Call
Promotion (ICP). After ICF what used to be two different addresses
become the same address. This can lead to a different code path being
taken.
This is where safe ICF comes in. Linker (LLD) does it using address
significant section generated by clang. If symbol is in it, or an object
doesn't have this section symbols are not folded.
BOLT does not have the information regarding which objects do not have
this section, so can't re-use this mechanism.
This implementation scans code section and conservatively marks
functions symbols as unsafe. It treats symbols as unsafe if they are
used in non-control flow instruction. It also scans through the data
relocation sections and does the same for relocations that reference a
function symbol. The latter handles the case when function pointer is
stored in a local or global variable, etc. If a relocation address
points within a vtable these symbols are skipped.
Added support to BOLT for DW_OP_GNU_push_tls_address. So now
DW_TAG_variable with this OP in DW_AT_location will appear in debug
names acceleration table. Although not in the DWARF 5 spec it is similar
to DW_OP_form_tls_address. Without this support llvm-dwarfdump --verify
--debug-names will report errors.
This change affects non-relocation mode only. Prior to having
CheckLargeFunctions pass, we could have emitted code for functions that
was discarded at the end due to size limitations. Since we didn't know
at the time of emission if the code would be discarded or not, we had to
emit jump tables in separate sections and handle them separately.
However, now we always run CheckLargeFunctions and make sure all emitted
code is used. Thus, we can get rid of the special jump table handling.
This fix handles a case where a DIE that does not have
DW_AT_name/DW_AT_linkage_name, but has a reference to another DIE using
DW_AT_abstract_origin/DW_AT_specification. It also fixes a bug where
there are cross CU references for those attributes. Previously it would
use a DWARF Unit of a DIE which was being processed The
warf5-debug-names-cross-cu.s test just happened to work because how it
was constructed where string section was shared by both DWARF Units.
To resolve DW_AT_name/DW_AT_linkage_name this patch iterates over
references until it either reaches the final DIE or finds both of those
names.
Use SymbolStringPtr for Symbol names in LinkGraph. This reduces string interning
on the boundary between JITLink and ORC, and allows pointer comparisons (rather
than string comparisons) between Symbol names. This should improve the
performance and readability of code that bridges between JITLink and ORC (e.g.
ObjectLinkingLayer and ObjectLinkingLayer::Plugins).
To enable use of SymbolStringPtr a std::shared_ptr<SymbolStringPool> is added to
LinkGraph and threaded through to its construction sites in LLVM and Bolt. All
LinkGraphs that are to have symbol names compared by pointer equality must point
to the same SymbolStringPool instance, which in ORC sessions should be the pool
attached to the ExecutionSession.
---------
Co-authored-by: Lang Hames <lhames@gmail.com>
This change extracts the comparator for sorting functions by index into
a helper function `compareBinaryFunctionByIndex()`
Not sure why the comparator used in
`BinaryContext::getSortedFunctions()` is not same as the other two
places. I think they should use the same comparator, so I also change
`BinaryContext::getSortedFunctions()` to use
`compareBinaryFunctionByIndex()` for sorting functions.
Use ULEB128 format for emitting LSDAs for fixed-address executables,
similar to what we use for PIEs/DSOs. Main difference is that we don't
use landing pad trampolines when landing pads are not contained in a
single fragment. Instead, we fallback to emitting larger fixed-address
LSDAs, which is still better than adding trampoline instructions.
We used to emit EH trampolines for PIE/DSO whenever a function fragment
contained a landing pad outside of it. However, it is common to have all
landing pads in a cold fragment even when their throwers are in a hot
one.
To reduce the number of trampolines, analyze landing pads for any given
function fragment, and if they all belong to the same (possibly
different) fragment, designate that fragment as a landing pad fragment
for the "thrower" fragment. Later, emit landing pad fragment symbol as
an LPStart for the thrower LSDA.
Use ULEB128 encoding for call sites in PIE/DSO binaries. The encoding
reduces the size of the tables compared to sdata4 and is the default
format used by Clang.
Note that for fixed-address executables we still use absolute addressing
to cover cases where landing pads can reside in different function
fragments.
For testing, we rely on runtime EH tests.
For C++ exception handling, when we write a call site table, we must
avoid emitting 0-value offsets for landing pads unless the call site has
no landing pad. However, 0 can be a real offset from the start of the
FDE if the FDE corresponds to a function fragment that starts with a
landing pad. In such cases, we used to emit a trap instruction at the
start of the fragment to guarantee non-zero LP offset.
To avoid emitting unnecessary trap instructions, we can instead set
LPStart to an offset from the FDE. If we emit it as [FDEStart - 1], then
all real offsets from LPStart in FDE become non-negative.
Under --use-old-text or --strict, we completely rewrite contents of EH
frames and exception tables sections. If new contents of either section
do not exceed the size of the original section, rewrite the section
in-place.
When emitting C++ exception tables (LSDAs), BOLT used to estimate the
size of the tables beforehand. This implementation was necessary as the
assembler/streamer lacked the emitULEB128IntValue() functionality.
As I plan to introduce [u|s]uleb128-encoded exception tables in BOLT,
now is a perfect time to switch to the new API and eliminate the need
to pre-compute the size of the tables.
"Large" functions are functions that are too big to fit into their
original slots after code modifications. CheckLargeFunctions pass is
designed to prevent such functions from emission. Extend this pass to
work with functions with constant islands.
Now that CheckLargeFunctions covers all functions, it guarantees that we
will never see such functions after code emission on all platforms
(previously it was guaranteed on x86 only). Hence, we can get rid of
RewriteInstance extensions that were meant to support "large" functions.
These specify that the value of the given register in the previous frame
is the CFA plus some offset. This isn't very common but can be necessary
if the original value is normally reconstructed from the stack/frame
pointer instead of being saved on the stack and reloaded from there.
Add `--compact-code-model` option that executes alternative branch
relaxation with an assumption that the resulting binary has less than
128MB of code. The relaxation is done in `relaxLocalBranches()`, which
operates on a function level and executes on multiple functions in
parallel.
Running the new option on AArch64 Clang binary produces slightly smaller
code and the relaxation finishes in about 1/10th of the time.
Note that the new `.text` has to be smaller than 128MB, *and* `.plt` has
to be closer than 128MB to `.text`.
This patch fixes:
bolt/lib/Core/BinaryFunction.cpp:2537:13: error: enumeration value
'OpNegateRAStateWithPC' not handled in switch [-Werror,-Wswitch]
bolt/lib/Core/BinaryFunction.cpp:2661:13: error: enumeration value
'OpNegateRAStateWithPC' not handled in switch [-Werror,-Wswitch]
bolt/lib/Core/BinaryFunction.cpp:2805:13: error: enumeration value
'OpNegateRAStateWithPC' not handled in switch [-Werror,-Wswitch]
While printing functions, expand --print-only flag to accept section
names. E.g., "--print-only=\.init" will only print functions from
".init" section.
… segments in Elf binary.
The heuristic is improved by also taking into account that only
executable segments should contain instructions.
Fixes#109384.
This PR improves how basic block execution count is updated when using
the BOLT option `-infer-fall-throughs`. Previously, if a 0-count
fall-through edge is assigned a positive inferred count N, then the
successor block's execution count will be incremented by N. Since the
successor's execution count is calculated using information besides
inflow sum (such as outflow sum), it likely is already correct, and
incrementing it by an additional N would be wrong. This PR improves how
the successor's execution count is updated by using the max over its
current count and N.
CFI programs may have more saves than restores and this is completely
benign from BOLT's perspective. Reduce the verbosity and print the
warning only under `-v=1` and above.
This patch aborts BOLT execution if it finds out-of-section (section
end) symbol in GOT table. In order to handle such situations properly in
future, we would need to have an arch-dependent way to analyze
relocations or its sequences, e.g., for ARM it would probably be ADRP +
LDR analysis in order to get GOT entry address. Currently, it is also
challenging because GOT-related relocation symbols are replaced to
__BOLT_got_zero. Anyway, it seems to be quite a rare case, which seems
to be only? related to static binaries. For the most part, it seems that
it should be handled on the linker stage, since static binary should not
have GOT table at all. LLD linker with relaxations enabled would replace
instruction addresses from GOT directly to target symbols, which
eliminates the problem.
Anyway, in order to achieve detection of such cases, this patch fixes a
few things in BOLT:
1. For the end symbols, we're now using the section provided by ELF
binary. Previously it would be tied with a wrong section found by symbol
address.
2. The end symbols would have limited registration we would only
add them in name->data GlobalSymbols map, since using address->data
BinaryDataMap map would likely be impossible due to address duality of
such symbols.
3. The outdated BD->getSection (currently returning refence, not
pointer) check in postProcessSymbolTable is replaced by getSize check in
order to allow zero-sized top-level symbols if they are located in
zero-sized sections. For the most part, such things could only be found
in tests, but I don't see a reason not to handle such cases.
4. Updated section-end-sym test and removed x86_64 requirement since
there is no reason for this (tested on aarch64 linux)
The test was provided by peterwaller-arm (thank you) in #100096 and
slightly modified by me.
Multi-way splitting can cause multiple fragments to access the same jump
table. Relax the assumption that a jump table can only have up to two
parents.
Test Plan: added bolt/test/X86/three-way-split-jt.s
Reviewers: ayermolo, dcci, rafaelauler, maksfb
Reviewed By: rafaelauler, dcci
Pull Request: https://github.com/llvm/llvm-project/pull/99988
Three-way splitting can create references between split fragments (warm
to cold or vice versa) that are not handled by
`isChildOf/isParentOf/isChildOrParentOf`. Generalize fragment
relationships to allow checking if two functions belong to one group,
potentially in presence of ICF which can join multiple groups.
Test Plan: NFC for existing tests
Reviewers: maksfb, ayermolo, rafaelauler, dcci
Reviewed By: rafaelauler
Pull Request: https://github.com/llvm/llvm-project/pull/99979
Detect and support fixed PIC indirect jumps of the following form:
```
movslq En(%rip), %r1
leaq PIC_JUMP_TABLE(%rip), %r2
addq %r2, %r1
jmpq *%r1
```
with PIC_JUMP_TABLE that looks like following:
```
JT: ----------
E1:| L1 - JT |
|----------|
E2:| L2 - JT |
|----------|
| |
......
En:| Ln - JT |
----------
```
The code could be produced by compilers, see
https://github.com/llvm/llvm-project/issues/91648.
Test Plan: updated jump-table-fixed-ref-pic.test
Reviewers: maksfb, ayermolo, dcci, rafaelauler
Reviewed By: rafaelauler
Pull Request: https://github.com/llvm/llvm-project/pull/91667