With --force-patch option, every original function entry point is
overwritten with a trampoline to a new version of the function to
prevent the execution of the original code.
If the function size is too small for the trampoline code, we are forced
to bail out on rewriting the function. That presented a problem on
AArch64 due to LongJmp pass that assumed the presence of the new copy of
the function. If the new copy was not emitted it could have lead to a
relocation overflow.
Run PatchEntries pass before LongJmp and make the latter aware of the
functions that are not going to be emitted. Make --force-patch option
behavior on AArch64 consistent with other architectures.
Besides virtual function pointers vtable could contain other kinds of
entries like those for RTTI data that also require relocations. We need
to skip special handling on relocations for non virtual function pointers
in relative vtable.
Co-authored-by: Maksim Panchenko <maks@meta.com>
See https://discourse.llvm.org/t/rfc-keep-globalvalue-guids-stable/84801
for context.
This is a non-functional change which just changes the interface of
GlobalValue, in preparation for future functional changes. This part
touches a fair few users, so is split out for ease of review. Future
changes to the GlobalValue implementation can then be focused purely on
that class.
This does the following:
* Rename GlobalValue::getGUID(StringRef) to
getGUIDAssumingExternalLinkage. This is simply making explicit at the
callsite what is currently implicit.
* Where possible, migrate users to directly calling getGUID on a
GlobalValue instance.
* Otherwise, where possible, have them call the newly renamed
getGUIDAssumingExternalLinkage, to make the assumption explicit.
There are a few cases where neither of the above are possible, as the
caller saves and reconstructs the necessary information to compute the
GUID themselves. We want to migrate these callers eventually, but for
this first step we leave them be.
Despite our attempt (build-docs.sh)
to build the documentation with SVG,
it still uses PNG https://llvm.org/doxygen/classllvm_1_1StringRef.html,
and that renders terribly on any high dpi display.
SVG leads to smasller installation and works fine
on all browser (that has been true for _a while_
https://caniuse.com/svg), so this patch just unconditionally build all
dot graphs as SVG in all subprojects and remove the option.
DataAggregator supports reading different kinds of profile data:
- perf data: branch records or IP samples,
- pre-aggregated branch data.
Make profile quality reporting uniform across all kinds of input:
- out-of-range and mismatching samples,
- samples in cold code in BAT mode (profiled BOLTed binary).
Test Plan: NFCI
Improve profile quality reporting by 1) fixing a format issue for small
binaries, 2) adding new stats for exception handling usage, 3) excluding
selected blocks when computing the CFG flow conservation score.
More specifically for 3), we are excluding blocks that satisfy at least
one of the following characteristics: a) is a landing pad, b) has at
least one landing pad with non-zero execution counts, c) ends with a
recursive call. The reason for a) and b) is because the thrower -->
landing pad edges are not explicitly represented in the CFG. The reason
for c) is because the call-continuation fallthrough edge count is not
important in case of recursive calls.
Modified test `bolt/test/X86/profile-quality-reporting.test`.
Added test `bolt/test/X86/profile-quality-reporting-small-binary.s`.
Add an advanced-user flag so we are able to rewrite binaries when we fail
to identify a suitable location to put new code. User then can supply a
custom location via --custom-allocation-vma. This happens more obviously if the
binary has segments mapped to very high addresses.
We will increase the use of raw relocation types and eliminate fixup
kinds that correspond to relocation types. The getFixupKindInfo
functions will return an rvalue instead. Let's update the return type
from a const reference to a value type.
Patch functions are used to fix instructions in the original code, i.e.,
they are not functions in a traditional sense, but rather pieces of
emitted code that are embedded into real functions.
We used to emit FDEs for all functions, including patch functions.
However, FDEs for patches are not only unnecessary, but they can lead to
problems with libraries and runtimes that consume FDEs, e.g. C++
exception handling runtime.
Note that we use named patches to fix function entry points and in that
case they behave more like regular functions. Thus we issue FDEs for
those.
a) Due to the different capabilities of the functions implemented,
rename the createCmpJE function
b) Refactor the convertIndirectCallToLoad function to override the
interface.
Patch by WangJee, originally posted in #136129
This patch adds code generation for RISCV64 instrumentation.The work
involved includes the following three points:
a) Implements support for instrumenting direct function call and jump
on RISC-V which relies on , Atomic instructions
(used to increment counters) are only available on RISC-V when the A
extension is used.
b) Implements support for instrumenting direct function inderect call
by implementing the createInstrumentedIndCallHandlerEntryBB and
createInstrumentedIndCallHandlerExitBB interfaces. In this process, we
need to accurately record the target address and IndCallID to ensure
the correct recording of the indirect call counters.
c)Implemented the RISCV64 Bolt runtime library, implemented some system
call interfaces through embedded assembly. Get the difference between
runtime addrress of .text section andstatic address in section header
table, which in turn can be used to search for indirect call
description.
However, the community code currently has problems with relocation in
some scenarios, but this has nothing to do with instrumentation. We
may continue to submit patches to fix the related bugs.
Some functions have their sizes as zero in input binary's symbol
table, like those compiled by assembler. When figuring out function
sizes, we may create label symbol if it doesn't point to any constant
island. However, before function size is known, marker symbol can
not be correctly associated to a function and therefore all such
checks would fail and we could end up adding a code label pointing
to constant island as secondary entry point and later mistakenly
marking the function as not simple.
Querying the global marker symbol array has big throughput overhead.
Instead we can run an extra check when post processing entry points
to identify such label symbols that actually point to constant islands.
To handle relative vftable, which is enabled with clang option
`-fexperimental-relative-c++-abi-vtables`, we look for PC relative
relocations whose fixup locations fall in vtable address ranges.
For such relocations, actual target is just virtual function itself,
and the addend is to record the distance between vtable slot for
target virtual function and the first virtual function slot in vtable,
which is to match generated code that calls virtual function. So
we can skip the logic of handling "function + offset" and directly
save such relocations for future fixup after new layout is known.
We used to report PLT traces as invalid (mismatching disassembled
function contents) because PLT functions are marked as pseudo and
ignored, thus missing CFG. However, such traces are not mismatching
the function contents. Accept them without attaching the profile.
Test Plan: updated callcont-fallthru.s
... to clarify ownership, aligning with other parameters. Using
`std::unique_ptr` encourages users to manage `createMCInstPrinter` with
a unique_ptr instead of a raw pointer, reducing the risk of memory
leaks.
* llvm-mc: fix a leak and update llvm/test/tools/llvm-mc/disassembler-options.test
* #121078 copied the llvm-mc code to CodeGenTargetMachineImpl and made
the same mistake. Fixed by 2b8cc651dc
Using unique_ptr requires #include MCInstPrinter.h in a few translation
units.
* Delete a createAsmStreamer overload I deprecated in 2024
* SystemZMCTargetDesc.cpp: rename to `createSystemZAsmStreamer` to fix
an overload conflict.
Pull Request: https://github.com/llvm/llvm-project/pull/135128
Scanning functions without CFG information as well as the detection of
authentication oracles requires introducing more classes related to
register state analysis. To make the future code easier to understand,
rename several classes beforehand.
To detect authentication oracles, one has to query the properties of
*output* operands of authentication instructions *after* the instruction
is executed - this requires adding another analysis that iterates over
the instructions in reverse order, and a corresponding state class.
As the main difference of the existing `State` class is that it stores
the properties of source register operands of the instructions before
the instruction's execution, rename it to `SrcState` and
`PacRetAnalysis` to `SrcSafetyAnalysis`.
Apply minor adjustments to the debug output along the way.
In addition to authenticated pointers, consider the contents of a
register safe if it was
* written by PC-relative address computation
* updated by an arithmetic instruction whose input address is safe
TLS relocations may not have a valid BOLT symbol associated with them.
While symbolizing the operand, we were checking for the symbol value,
and since there was no symbol the check resulted in a crash.
Handle TLS case while performing operand symbolization on AArch64.
When a pending relocation is created it is also marked whether it is
optional or not. It can be optional when such relocation is added as
part of an optimization (i.e., `scanExternalRefs`).
When bolt tries to `flushPendingRelocations`, it safely skips any
optional relocations that cannot be encoded due to being out of
range. A pre-requisite to that is the usage of the `-force-patch`
flag. Alternatrively, BOLT will bail out with a relevant message.
Background:
BOLT, as part of scanExternalRefs, identifies external references from
calls and creates some pending relocations for them. Those when
flushed will update references to point to the optimized functions.
This optimization can be disabled using `--no-scan`.
BOLT can assert if any of these pending relocations cannot be encoded.
This patch does not disable this optimization but instead selectively
applies it given that a pending relocation is optional and `-force-patch`
was enabled.
This patch fixes the following two issues with the createCmpJE for
AArch64:
1. Avoids overwriting the value of the input register RegNo by use XZR
as the destination register.
subs xzr, RegNo, #Imm
which is equivalent to a simple
cmp RegNo, #Imm
2. The immediate operand to the Bcc instruction must be EQ instead of
#Imm.
This patch also adds a new function for createCmpJNE and unit tests for
the both createCmpJE and createCmpJNE for X86 and AArch64.
In 96e5ee2, I inadvertently broke the way non-trivial symbol references
got updated from non-optimized code. The breakage was a consequence of
`getTargetSymbol(MCExpr *)` not returning a symbol when the parameter
was a binary expression. Fix `getTargetSymbol()` to cover such cases.
In lite mode, we only emit code for a subset of functions while
preserving the original code in .bolt.org.text. This requires updating
code references in non-emitted functions to ensure that:
* Non-optimized versions of the optimized code never execute.
* Function pointer comparison semantics is preserved.
On x86-64, we can update code references in-place using "pending
relocations" added in scanExternalRefs(). However, on AArch64, this is
not always possible due to address range limitations and linker address
"relaxation".
There are two types of code-to-code references: control transfer (e.g.,
calls and branches) and function pointer materialization.
AArch64-specific control transfer instructions are covered by #116964.
For function pointer materialization, simply changing the immediate
field of an instruction is not always sufficient. In some cases, we need
to modify a pair of instructions, such as undoing linker relaxation and
converting NOP+ADR into ADRP+ADD sequence.
To achieve this, we use the instruction patch mechanism instead of
pending relocations. Instruction patches are emitted via the regular MC
layer, just like regular functions. However, they have a fixed address
and do not have an associated symbol table entry. This allows us to make
more complex changes to the code, ensuring that function pointers are
correctly updated. Such mechanism should also be portable to RISC-V and
other architectures.
To summarize, for AArch64, we extend the scanExternalRefs() process to
undo linker relaxation and use instruction patches to partially
overwrite unoptimized code.
In preparation for implementing support for detection of non-protected
call instructions, refine the definition of state which is computed for
each register by data-flow analysis.
Explicitly marking the registers which are known to be trusted at
function entry is crucial for finding non-protected calls. In addition,
it fixes less-common false negatives for pac-ret, such as `ret x1` in
`f_nonx30_ret_non_auted` test case.
In preparation for adding more gadget kinds to detect, streamline
issue reporting.
Rename classes representing issue reports. In particular, rename
`Annotation` base class to `Report`, as it has nothing to do with
"annotations" in `MCPlus` terms anymore. Remove references to "return
instructions" from variable names and report messages, use generic
terms instead. Rename NonPacProtectedRetAnalysis to PAuthGadgetScanner.
Remove `GeneralDiagnostic` as a separate class, make `GenericReport`
(former `GenDiag`) store `std::string Text` directly. Remove unused
`operator=` and `operator==` methods, as `Report`s are created on the
heap and referenced via `shared_ptr`s.
Introduce `GadgetKind` class - currently, it only wraps a `const char *`
description to display to the user. This description is intended to be
a per-gadget-kind constant (or a few hard-coded constants), so no need
to store it to `std::string` field in each report instance. To handle
both free-form `GenericReport`s and statically-allocated messages
without unnecessary overhead, move printing of the report header to the
base class (and take the message argument as a `StringRef`).
Follow the X86 and Mips renaming.
> "Relocation modifier" suggests adjustments happen during the linker's relocation step rather than the assembler's expression evaluation.
> "Relocation specifier" is clear, aligns with Arm and IBM AIX's documentation, and fits the assembler's role seamlessly.
In addition, rename *MCExpr::getKind, which confusingly shadows the base class getKind.
This patch converts `Relocations` from a struct to a class, and
introduces the `Optional` field. Patch #116964 will use it.
Some optimizations, like `scanExternalRefs`, create relocations that
patch the old code. Under certain circumstances these may be skipped
without correctness implications.
Factor out the code for mapping from physical registers to consecutive
array indexes.
Introduce helper functions to print instructions and registers to
prevent mixing of analysis logic and implementation details of debug
output.
Removed the debug printing from `Gadget::generateReport`, as it doesn't
seem to add important information to what was already printed in the
report itself.
Create entry points for addresses referenced by dynamic relocations and
allow getNewFunctionOrDataAddress to map addrs inside functions. By
adding addresses referenced by dynamic relocations as entry points. This
patch fixes an issue where bolt fails on code using computing goto's.
This also fixes a mapping issue with the bugfix from this PR:
https://github.com/llvm/llvm-project/pull/117766.