Files
clang-p2996/lld/ELF/Arch/TargetImpl.h
Peter Collingbourne 494a74882b Reapply "ELF: Add branch-to-branch optimization."
Fixed assertion failure when reading .eh_frame sections, and added
.eh_frame sections to tests.

This reverts commit 1e95349dbe.

Original commit message follows:

When code calls a function which then immediately tail calls another
function there is no need to go via the intermediate function. By
branching directly to the target function we reduce the program's working
set for a slight increase in runtime performance.

Normally it is relatively uncommon to have functions that just tail call
another function, but with LLVM control flow integrity we have jump tables
that replace the function itself as the canonical address. As a result,
when a function address is taken and called directly, for example after
a compiler optimization resolves the indirect call, or if code built
without control flow integrity calls the function, the call will go via
the jump table.

The impact of this optimization was measured using a large internal
Google benchmark. The results were as follows:

CFI enabled:  +0.1% ± 0.05% queries per second
CFI disabled: +0.01% queries per second [not statistically significant]

The optimization is enabled by default at -O2 but may also be enabled
or disabled individually with --{,no-}branch-to-branch.

This optimization is implemented for AArch64 and X86_64 only.

lld's runtime performance (real execution time) after adding this
optimization was measured using firefox-x64 from lld-speed-test [1]
with ldflags "-O2 -S" on an Apple M2 Ultra. The results are as follows:

```
    N           Min           Max        Median           Avg        Stddev
x 512     1.2264546     1.3481076     1.2970261     1.2965788   0.018620888
+ 512     1.2561196     1.3839965     1.3214632     1.3209327   0.019443971
Difference at 95.0% confidence
        0.0243538 +/- 0.00233202
        1.87831% +/- 0.179859%
        (Student's t, pooled s = 0.0190369)
```

[1] https://discourse.llvm.org/t/improving-the-reproducibility-of-linker-benchmarking/86057

Reviewers: zmodem, MaskRay

Reviewed By: MaskRay

Pull Request: https://github.com/llvm/llvm-project/pull/145579
2025-06-24 22:16:18 -07:00

94 lines
4.0 KiB
C++

//===----------------------------------------------------------------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
#ifndef LLD_ELF_ARCH_TARGETIMPL_H
#define LLD_ELF_ARCH_TARGETIMPL_H
#include "InputFiles.h"
#include "InputSection.h"
#include "Relocations.h"
#include "Symbols.h"
#include "llvm/BinaryFormat/ELF.h"
namespace lld::elf {
// getControlTransferAddend: If this relocation is used for control transfer
// instructions (e.g. branch, branch-link or call) or code references (e.g.
// virtual function pointers) and indicates an address-insignificant reference,
// return the effective addend for the relocation, otherwise return
// std::nullopt. The effective addend for a relocation is the addend that is
// used to determine its branch destination.
//
// getBranchInfoAtTarget: If a control transfer relocation referring to
// is+offset directly transfers control to a relocated branch instruction in the
// specified section, return the relocation for the branch target as well as its
// effective addend (see above). Otherwise return {nullptr, 0}.
//
// redirectControlTransferRelocations: Given r1, a relocation for which
// getControlTransferAddend() returned a value, and r2, a relocation returned by
// getBranchInfo(), modify r1 so that it branches directly to the target of r2.
template <typename GetControlTransferAddend, typename GetBranchInfoAtTarget,
typename RedirectControlTransferRelocations>
inline void applyBranchToBranchOptImpl(
Ctx &ctx, GetControlTransferAddend getControlTransferAddend,
GetBranchInfoAtTarget getBranchInfoAtTarget,
RedirectControlTransferRelocations redirectControlTransferRelocations) {
// Needs to run serially because it writes to the relocations array as well as
// reading relocations of other sections.
for (ELFFileBase *f : ctx.objectFiles) {
auto getRelocBranchInfo =
[&getBranchInfoAtTarget](
Relocation &r,
uint64_t addend) -> std::pair<Relocation *, uint64_t> {
auto *target = dyn_cast_or_null<Defined>(r.sym);
// We don't allow preemptible symbols or ifuncs (may go somewhere else),
// absolute symbols (runtime behavior unknown), non-executable or writable
// memory (ditto) or non-regular sections (no section data).
if (!target || target->isPreemptible || target->isGnuIFunc() ||
!target->section ||
!(target->section->flags & llvm::ELF::SHF_EXECINSTR) ||
(target->section->flags & llvm::ELF::SHF_WRITE) ||
target->section->kind() != SectionBase::Regular)
return {nullptr, 0};
return getBranchInfoAtTarget(*cast<InputSection>(target->section),
target->value + addend);
};
for (InputSectionBase *sb : f->getSections()) {
auto *s = dyn_cast_or_null<InputSection>(sb);
if (!s)
continue;
for (Relocation &r : s->relocations) {
std::optional<uint64_t> addend = getControlTransferAddend(*s, r);
if (!addend)
continue;
std::pair<Relocation *, uint64_t> targetAndAddend =
getRelocBranchInfo(r, *addend);
if (!targetAndAddend.first)
continue;
// Avoid getting stuck in an infinite loop if we encounter a branch
// that (possibly indirectly) branches to itself. It is unlikely
// that more than 5 iterations will ever be needed in practice.
size_t iterations = 5;
while (iterations--) {
std::pair<Relocation *, uint64_t> nextTargetAndAddend =
getRelocBranchInfo(*targetAndAddend.first,
targetAndAddend.second);
if (!nextTargetAndAddend.first)
break;
targetAndAddend = nextTargetAndAddend;
}
redirectControlTransferRelocations(r, *targetAndAddend.first);
}
}
}
}
} // namespace lld::elf
#endif