Commit Graph

2585 Commits

Author SHA1 Message Date
Maksim Panchenko
7d6fda4fd3 [BOLT] Run PatchEntries pass before LongJmp (#137236)
With --force-patch option, every original function entry point is
overwritten with a trampoline to a new version of the function to
prevent the execution of the original code.

If the function size is too small for the trampoline code, we are forced
to bail out on rewriting the function. That presented a problem on
AArch64 due to LongJmp pass that assumed the presence of the new copy of
the function. If the new copy was not emitted it could have lead to a
relocation overflow.

Run PatchEntries pass before LongJmp and make the latter aware of the
functions that are not going to be emitted. Make --force-patch option
behavior on AArch64 consistent with other architectures.
2025-05-01 15:09:09 -07:00
Gergely Bálint
5b20b5721a [BOLT][AArch64] Allow binary-analysis and heatmap tool to run with pac-ret binaries (#136664)
OpNegateRAState support is only needed for tools that produce binaries.
2025-04-30 13:41:11 +01:00
Elvina Yakubova
5cec6f6f2d [BOLT][NFC] Add keep-nops option to non-empty-debug-line.test (#137812)
On openSUSE distribution test is failing due to different .debug_line
size without the keep-nops option
2025-04-29 18:16:36 +01:00
YongKang Zhu
316a6ff3d0 [BOLT][RelVTable] Skip special handling on non virtual function pointer relocations (#137406)
Besides virtual function pointers vtable could contain other kinds of
entries like those for RTTI data that also require relocations. We need
to skip special handling on relocations for non virtual function pointers
in relative vtable.

Co-authored-by: Maksim Panchenko <maks@meta.com>
2025-04-29 08:13:44 -07:00
Owen Rodley
d3d856ad84 Clean up external users of GlobalValue::getGUID(StringRef) (#129644)
See https://discourse.llvm.org/t/rfc-keep-globalvalue-guids-stable/84801
for context.

This is a non-functional change which just changes the interface of
GlobalValue, in preparation for future functional changes. This part
touches a fair few users, so is split out for ease of review. Future
changes to the GlobalValue implementation can then be focused purely on
that class.

This does the following:

* Rename GlobalValue::getGUID(StringRef) to
  getGUIDAssumingExternalLinkage. This is simply making explicit at the
  callsite what is currently implicit.
* Where possible, migrate users to directly calling getGUID on a
  GlobalValue instance.
* Otherwise, where possible, have them call the newly renamed
  getGUIDAssumingExternalLinkage, to make the assumption explicit.


There are a few cases where neither of the above are possible, as the
caller saves and reconstructs the necessary information to compute the
GUID themselves. We want to migrate these callers eventually, but for
this first step we leave them be.
2025-04-28 11:09:43 +10:00
cor3ntin
320ec7fa7f [Documentation] Always use SVG for dot-generated doxygen images. (#136843)
Despite our attempt (build-docs.sh)
to build the documentation with SVG,
it still uses PNG https://llvm.org/doxygen/classllvm_1_1StringRef.html,

and that renders terribly on any high dpi display.

SVG leads to smasller installation and works fine
on all browser (that has been true for _a while_
https://caniuse.com/svg), so this patch just unconditionally build all
dot graphs as SVG in all subprojects and remove the option.
2025-04-25 14:13:17 +02:00
Amir Ayupov
5d0afacd1b [BOLT][NFCI] Emit uniform diagnostics in DataAggregator (#136530)
DataAggregator supports reading different kinds of profile data:
- perf data: branch records or IP samples,
- pre-aggregated branch data.

Make profile quality reporting uniform across all kinds of input:
- out-of-range and mismatching samples,
- samples in cold code in BAT mode (profiled BOLTed binary).

Test Plan: NFCI
2025-04-24 13:51:18 -07:00
Anatoly Trosinenko
37e8c6c6ee [BOLT] Do not return Def-ed registers from MCPlusBuilder::getUsedRegs (#129890)
Update the implementation of `MCPlusBuilder::getUsedRegs` to match its
description in the header file, add unit tests.
2025-04-23 13:32:59 +03:00
ShatianWang
ce2b3ce3b6 [BOLT] Improve profile quality reporting (#130810)
Improve profile quality reporting by 1) fixing a format issue for small
binaries, 2) adding new stats for exception handling usage, 3) excluding
selected blocks when computing the CFG flow conservation score.

More specifically for 3), we are excluding blocks that satisfy at least
one of the following characteristics: a) is a landing pad, b) has at
least one landing pad with non-zero execution counts, c) ends with a
recursive call. The reason for a) and b) is because the thrower -->
landing pad edges are not explicitly represented in the CFG. The reason
for c) is because the call-continuation fallthrough edge count is not
important in case of recursive calls.

Modified test `bolt/test/X86/profile-quality-reporting.test`.
Added test `bolt/test/X86/profile-quality-reporting-small-binary.s`.
2025-04-22 15:42:47 -04:00
YongKang Zhu
2dca9e80ff [BOLT][test] Resolve symlink for nm tool (NFC) (#136722)
Handle the case where nm could be a symlink to llvm-nm.
2025-04-22 12:08:56 -07:00
Kazu Hirata
a8644b3d88 [BOLT] Call hash_combine_range with ranges (NFC) (#136524) 2025-04-20 19:41:26 -07:00
Kazu Hirata
c6e7bb19f7 [BOLT] Use llvm::unique (NFC) (#136513) 2025-04-20 18:29:51 -07:00
Rafael Auler
3bcb724903 [BOLT] Add --custom-allocation-vma flag (#136385)
Add an advanced-user flag so we are able to rewrite binaries when we fail
to identify a suitable location to put new code. User then can supply a
custom location via --custom-allocation-vma. This happens more obviously if the
binary has segments mapped to very high addresses.
2025-04-18 21:02:09 -07:00
Fangrui Song
c239acb5b6 MCFixup: Make FixupKindInfo smaller and change getFixupKindInfo to return value
We will increase the use of raw relocation types and eliminate fixup
kinds that correspond to relocation types. The getFixupKindInfo
functions will return an rvalue instead. Let's update the return type
from a const reference to a value type.
2025-04-18 20:55:43 -07:00
Rafael Auler
5c4e6c6113 [BOLT] Don't choke on nobits symbols (#136384) 2025-04-18 17:29:24 -07:00
Maksim Panchenko
0977a7130b [BOLT] Skip FDE emission for patch functions (#136224)
Patch functions are used to fix instructions in the original code, i.e.,
they are not functions in a traditional sense, but rather pieces of
emitted code that are embedded into real functions.

We used to emit FDEs for all functions, including patch functions.
However, FDEs for patches are not only unnecessary, but they can lead to
problems with libraries and runtimes that consume FDEs, e.g. C++
exception handling runtime.

Note that we use named patches to fix function entry points and in that
case they behave more like regular functions. Thus we issue FDEs for
those.
2025-04-17 19:58:32 -07:00
Kazu Hirata
2af5e01456 [BOLT][RISCV] Fix MCPlusBuilder instrumentation ifaces (#136211)
a) Due to the different capabilities of the functions implemented,
rename the createCmpJE function
b) Refactor the convertIndirectCallToLoad function to override the
interface.

Patch by WangJee, originally posted in #136129
2025-04-17 15:27:44 -07:00
wangjue
dbb79c30c9 [BOLT][Instrumentation] Initial instrumentation support for RISCV64 (#133882)
This patch adds code generation for RISCV64 instrumentation.The work
    involved includes the following three points:

a) Implements support for instrumenting direct function call and jump
    on RISC-V which relies on , Atomic instructions
    (used to increment counters) are only available on RISC-V when the A
    extension is used.

b) Implements support for instrumenting direct function inderect call
    by implementing the createInstrumentedIndCallHandlerEntryBB and
createInstrumentedIndCallHandlerExitBB interfaces. In this process, we
    need to accurately record the target address and IndCallID to ensure
    the correct recording of the indirect call counters.

c)Implemented the RISCV64 Bolt runtime library, implemented some system
call interfaces through embedded assembly. Get the difference between
runtime addrress of .text section andstatic address in section header
table, which in turn can be used to search for indirect call
description.

However, the community code currently has problems with relocation in
    some scenarios, but this has nothing to do with instrumentation. We
    may continue to submit patches to fix the related bugs.
2025-04-16 23:01:00 -07:00
Maksim Panchenko
0b8f817aab [BOLT] Fix conditional compilation of hugify.cpp (#135880)
Fix builds after #117158: do not build hugify.cpp on Apple platforms.
2025-04-15 16:59:05 -07:00
YongKang Zhu
823adc7a2d [BOLT] Validate secondary entry point (#135731)
Some functions have their sizes as zero in input binary's symbol
table, like those compiled by assembler. When figuring out function
sizes, we may create label symbol if it doesn't point to any constant
island. However, before function size is known, marker symbol can
not be correctly associated to a function and therefore all such
checks would fail and we could end up adding a code label pointing
to constant island as secondary entry point and later mistakenly
marking the function as not simple.

Querying the global marker symbol array has big throughput overhead.
Instead we can run an extra check when post processing entry points
to identify such label symbols that actually point to constant islands.
2025-04-15 13:19:15 -07:00
alekuz01
38faf32d23 [BOLT] Enable hugify for AArch64 (#117158)
Add required hugify instrumentation and runtime libraries support for AArch64.
Fixes #58226
Unblocks #62695
2025-04-15 12:59:05 +01:00
YongKang Zhu
2a83c0cc13 [BOLT] Support relative vtable (#135449)
To handle relative vftable, which is enabled with clang option
`-fexperimental-relative-c++-abi-vtables`, we look for PC relative
relocations whose fixup locations fall in vtable address ranges.
For such relocations, actual target is just virtual function itself,
and the addend is to record the distance between vtable slot for
target virtual function and the first virtual function slot in vtable,
which is to match generated code that calls virtual function. So
we can skip the logic of handling "function + offset" and directly
save such relocations for future fixup after new layout is known.
2025-04-14 10:24:47 -07:00
Kazu Hirata
7940b0546b [BOLT] Fix warning
This patch fixes:

  bolt/lib/Core/BinaryContext.cpp:582:8: error: unused variable
  'printEntryDiagnostics' [-Werror,-Wunused-variable]

  bolt/lib/Core/BinaryContext.cpp:842:10: error: unused variable
  'isSibling' [-Werror,-Wunused-variable]
2025-04-12 23:35:49 -07:00
Amir Ayupov
fa4ac19f0f [BOLT] Accept PLT fall-throughs as valid traces (#129481)
We used to report PLT traces as invalid (mismatching disassembled
function contents) because PLT functions are marked as pseudo and
ignored, thus missing CFG. However, such traces are not mismatching
the function contents. Accept them without attaching the profile.

Test Plan: updated callcont-fallthru.s
2025-04-11 21:26:19 -07:00
Fangrui Song
c04d9d57ee MCAsmStreamer: Replace the MCInstPrinter * parameter with unique_ptr
... to clarify ownership, aligning with other parameters. Using
`std::unique_ptr` encourages users to manage `createMCInstPrinter` with
a unique_ptr instead of a raw pointer, reducing the risk of memory
leaks.

* llvm-mc: fix a leak and update llvm/test/tools/llvm-mc/disassembler-options.test
* #121078 copied the llvm-mc code to CodeGenTargetMachineImpl and made
  the same mistake. Fixed by 2b8cc651dc

Using unique_ptr requires #include MCInstPrinter.h in a few translation
units.

* Delete a createAsmStreamer overload I deprecated in 2024
* SystemZMCTargetDesc.cpp: rename to `createSystemZAsmStreamer` to fix
  an overload conflict.

Pull Request: https://github.com/llvm/llvm-project/pull/135128
2025-04-10 21:25:35 -07:00
Amir Ayupov
ba93fe97c2 [BOLT][NFC] Simplify getOrCreate/analyze/populate/emitJumpTable (#132108) 2025-04-10 21:17:04 -07:00
Anatoly Trosinenko
2927050dd4 [BOLT] Gadget scanner: refine class names and debug output (NFC) (#135073)
Scanning functions without CFG information as well as the detection of
authentication oracles requires introducing more classes related to
register state analysis. To make the future code easier to understand,
rename several classes beforehand.

To detect authentication oracles, one has to query the properties of
*output* operands of authentication instructions *after* the instruction
is executed - this requires adding another analysis that iterates over
the instructions in reverse order, and a corresponding state class.

As the main difference of the existing `State` class is that it stores
the properties of source register operands of the instructions before
the instruction's execution, rename it to `SrcState` and
`PacRetAnalysis` to `SrcSafetyAnalysis`.

Apply minor adjustments to the debug output along the way.
2025-04-10 20:54:05 +03:00
Anatoly Trosinenko
8521bd2424 [BOLT][AArch64] Handle PAuth call instructions in isIndirectCall (#133227)
Handle `BLRA*` opcodes in AArch64MCPlusBuilder::isIndirectCall, update
getRegUsedAsCallDest accordingly.
2025-04-08 13:23:10 +03:00
Anatoly Trosinenko
2c107238d5 [BOLT] Make DataflowAnalysis::getStateBefore() const (NFC) (#133308) 2025-04-07 13:37:34 +03:00
Anatoly Trosinenko
0fc7aec349 [BOLT] Gadget scanner: detect address materialization and arithmetic (#132540)
In addition to authenticated pointers, consider the contents of a
register safe if it was
* written by PC-relative address computation
* updated by an arithmetic instruction whose input address is safe
2025-04-07 13:13:11 +03:00
Maksim Panchenko
e4cbb7780b [BOLT][AArch64] Fix symbolization of unoptimized TLS access (#134332)
TLS relocations may not have a valid BOLT symbol associated with them.
While symbolizing the operand, we were checking for the symbol value,
and since there was no symbol the check resulted in a crash.

Handle TLS case while performing operand symbolization on AArch64.
2025-04-04 11:42:21 -07:00
Paschalis Mpeis
3d24046b33 [BOLT] Skip out-of-range pending relocations (#116964)
When a pending relocation is created it is also marked whether it is
optional or not. It can be optional when such relocation is added as
part of an optimization (i.e., `scanExternalRefs`).

When bolt tries to `flushPendingRelocations`, it safely skips any
optional relocations that cannot be encoded due to being out of
range. A pre-requisite to that is the usage of the `-force-patch`
flag. Alternatrively, BOLT will bail out with a relevant message.

Background:
BOLT, as part of scanExternalRefs, identifies external references from
calls and creates some pending relocations for them. Those when
flushed will update references to point to the optimized functions.
This optimization can be disabled using `--no-scan`.

BOLT can assert if any of these pending relocations cannot be encoded.

This patch does not disable this optimization but instead selectively
applies it given that a pending relocation is optional and `-force-patch`
was enabled.
2025-04-04 17:31:14 +01:00
Rodrigo Rocha
b9891715af [BOLT] Handle generation of compare and jump sequences (#131949)
This patch fixes the following two issues with the createCmpJE for
AArch64:
1. Avoids overwriting the value of the input register RegNo by use XZR
as the destination register.
   subs xzr, RegNo, #Imm
   which is equivalent to a simple
   cmp RegNo, #Imm
2. The immediate operand to the Bcc instruction must be EQ instead of
#Imm.

This patch also adds a new function for createCmpJNE and unit tests for
the both createCmpJE and createCmpJNE for X86 and AArch64.
2025-04-03 18:34:24 -07:00
Anatoly Trosinenko
c818ae7399 [BOLT] Gadget scanner: detect non-protected indirect calls (#131899)
Implement the detection of non-protected indirect calls and branches
similar to pac-ret scanner.
2025-04-03 16:40:34 +03:00
Alexey Moksyakov
19a319667b [bolt][aarch64] Adding test with unsupported indirect branches (#127655)
This test contains the set of common indirect branch patterns.
Adding the support will be step by step
2025-04-01 13:49:09 +03:00
Maksim Panchenko
b2d272ccfb [BOLT][X86] Fix getTargetSymbol() (#133834)
In 96e5ee2, I inadvertently broke the way non-trivial symbol references
got updated from non-optimized code. The breakage was a consequence of
`getTargetSymbol(MCExpr *)` not returning a symbol when the parameter
was a binary expression. Fix `getTargetSymbol()` to cover such cases.
2025-03-31 18:31:33 -07:00
Kazu Hirata
0c7be9392f [BOLT] Use *Set::insert_range (NFC) (#133601) 2025-03-29 16:52:16 -07:00
Paschalis Mpeis
427725508b [BOLT] Add getter for optional relocations (#133085)
Minor refactoring on comments.
2025-03-28 14:07:51 +00:00
Maksim Panchenko
96e5ee23a7 [BOLT][AArch64] Add partial support for lite mode (#133014)
In lite mode, we only emit code for a subset of functions while
preserving the original code in .bolt.org.text. This requires updating
code references in non-emitted functions to ensure that:

* Non-optimized versions of the optimized code never execute.
* Function pointer comparison semantics is preserved.

On x86-64, we can update code references in-place using "pending
relocations" added in scanExternalRefs(). However, on AArch64, this is
not always possible due to address range limitations and linker address
"relaxation".

There are two types of code-to-code references: control transfer (e.g.,
calls and branches) and function pointer materialization.
AArch64-specific control transfer instructions are covered by #116964.

For function pointer materialization, simply changing the immediate
field of an instruction is not always sufficient. In some cases, we need
to modify a pair of instructions, such as undoing linker relaxation and
converting NOP+ADR into ADRP+ADD sequence.

To achieve this, we use the instruction patch mechanism instead of
pending relocations. Instruction patches are emitted via the regular MC
layer, just like regular functions. However, they have a fixed address
and do not have an associated symbol table entry. This allows us to make
more complex changes to the code, ensuring that function pointers are
correctly updated. Such mechanism should also be portable to RISC-V and
other architectures.

To summarize, for AArch64, we extend the scanExternalRefs() process to
undo linker relaxation and use instruction patches to partially
overwrite unoptimized code.
2025-03-27 21:33:25 -07:00
Ash Dobrescu
a308d421aa Remove -no-pie case from indirect-goto-relocs.test (#133067)
This test was added in PR:
https://github.com/llvm/llvm-project/pull/120267. The -no-pie case in
the above mentioned test needs to be removed as subsequent changes have
caused it to fail.
2025-03-26 11:11:55 +00:00
Anatoly Trosinenko
b6b40e9ac9 [BOLT] Gadget scanner: reformulate the state for data-flow analysis (#131898)
In preparation for implementing support for detection of non-protected
call instructions, refine the definition of state which is computed for
each register by data-flow analysis.

Explicitly marking the registers which are known to be trusted at
function entry is crucial for finding non-protected calls. In addition,
it fixes less-common false negatives for pac-ret, such as `ret x1` in
`f_nonx30_ret_non_auted` test case.
2025-03-25 21:45:02 +03:00
Kazu Hirata
993311799b [BOLT] Fix a warning
This patch fixes:

  bolt/lib/Passes/PAuthGadgetScanner.cpp:438:18: error: unused
  variable 'BC' [-Werror,-Wunused-variable]
2025-03-21 11:08:27 -07:00
Anatoly Trosinenko
72d1058af0 [BOLT] Gadget scanner: refactor analysis of RET instructions (#131897)
In preparation for implementing detection of more gadget kinds,
refactor checking for non-protected return instructions.
2025-03-21 19:54:57 +03:00
Paschalis Mpeis
6bbd45dec7 [NFC][BOLT] Refactor ForcePatch option (#127812)
Move force-patch flag to CommandLineOpts and add details on
PatchEntries.
2025-03-21 15:55:09 +00:00
Anatoly Trosinenko
03557169e0 [BOLT] Gadget scanner: streamline issue reporting (#131896)
In preparation for adding more gadget kinds to detect, streamline
issue reporting.

Rename classes representing issue reports. In particular, rename
`Annotation` base class to `Report`, as it has nothing to do with
"annotations" in `MCPlus` terms anymore. Remove references to "return
instructions" from variable names and report messages, use generic
terms instead. Rename NonPacProtectedRetAnalysis to PAuthGadgetScanner.

Remove `GeneralDiagnostic` as a separate class, make `GenericReport`
(former `GenDiag`) store `std::string Text` directly. Remove unused
`operator=` and `operator==` methods, as `Report`s are created on the
heap and referenced via `shared_ptr`s.

Introduce `GadgetKind` class - currently, it only wraps a `const char *`
description to display to the user. This description is intended to be
a per-gadget-kind constant (or a few hard-coded constants), so no need
to store it to `std::string` field in each report instance. To handle
both free-form `GenericReport`s and statically-allocated messages
without unnecessary overhead, move printing of the report header to the
base class (and take the message argument as a `StringRef`).
2025-03-21 11:19:53 +03:00
Fangrui Song
42a8813757 [RISCV] Rename VariantKind to Specifier
Follow the X86 and Mips renaming.

> "Relocation modifier" suggests adjustments happen during the linker's relocation step rather than the assembler's expression evaluation.
> "Relocation specifier" is clear, aligns with Arm and IBM AIX's documentation, and fits the assembler's role seamlessly.

In addition, rename *MCExpr::getKind, which confusingly shadows the base class getKind.
2025-03-20 22:25:57 -07:00
Paschalis Mpeis
5f6d9b45e9 [BOLT] Make Relocations a class and add optional field (#131638)
This patch converts `Relocations` from a struct to a class, and
introduces the `Optional` field. Patch #116964 will use it.

Some optimizations, like `scanExternalRefs`, create relocations that
patch the old code. Under certain circumstances these may be skipped
without correctness implications.
2025-03-20 17:16:14 +00:00
Kazu Hirata
10624e67c3 [BOLT] Fix warnings
bolt/lib/Passes/NonPacProtectedRetAnalysis.cpp:62:13: error: unused
  function 'traceInst' [-Werror,-Wunused-function]

  bolt/lib/Passes/NonPacProtectedRetAnalysis.cpp:68:13: error: unused
  function 'traceReg' [-Werror,-Wunused-function]

  bolt/lib/Passes/NonPacProtectedRetAnalysis.cpp:80:13: error: unused
  function 'traceRegMask' [-Werror,-Wunused-function]
2025-03-20 10:12:46 -07:00
Anatoly Trosinenko
482b95217e [BOLT] Gadget scanner: factor out utility code (#131895)
Factor out the code for mapping from physical registers to consecutive
array indexes.

Introduce helper functions to print instructions and registers to
prevent mixing of analysis logic and implementation details of debug
output.

Removed the debug printing from `Gadget::generateReport`, as it doesn't
seem to add important information to what was already printed in the
report itself.
2025-03-20 19:35:31 +03:00
Ash Dobrescu
3bba268013 [BOLT] Support computed goto and allow map addrs inside functions (#120267)
Create entry points for addresses referenced by dynamic relocations and
allow getNewFunctionOrDataAddress to map addrs inside functions. By
adding addresses referenced by dynamic relocations as entry points. This
patch fixes an issue where bolt fails on code using computing goto's.
This also fixes a mapping issue with the bugfix from this PR:
https://github.com/llvm/llvm-project/pull/117766.
2025-03-19 14:55:59 +00:00