Files
clang-p2996/lld/docs/ReleaseNotes.rst
Peter Collingbourne 494a74882b Reapply "ELF: Add branch-to-branch optimization."
Fixed assertion failure when reading .eh_frame sections, and added
.eh_frame sections to tests.

This reverts commit 1e95349dbe.

Original commit message follows:

When code calls a function which then immediately tail calls another
function there is no need to go via the intermediate function. By
branching directly to the target function we reduce the program's working
set for a slight increase in runtime performance.

Normally it is relatively uncommon to have functions that just tail call
another function, but with LLVM control flow integrity we have jump tables
that replace the function itself as the canonical address. As a result,
when a function address is taken and called directly, for example after
a compiler optimization resolves the indirect call, or if code built
without control flow integrity calls the function, the call will go via
the jump table.

The impact of this optimization was measured using a large internal
Google benchmark. The results were as follows:

CFI enabled:  +0.1% ± 0.05% queries per second
CFI disabled: +0.01% queries per second [not statistically significant]

The optimization is enabled by default at -O2 but may also be enabled
or disabled individually with --{,no-}branch-to-branch.

This optimization is implemented for AArch64 and X86_64 only.

lld's runtime performance (real execution time) after adding this
optimization was measured using firefox-x64 from lld-speed-test [1]
with ldflags "-O2 -S" on an Apple M2 Ultra. The results are as follows:

```
    N           Min           Max        Median           Avg        Stddev
x 512     1.2264546     1.3481076     1.2970261     1.2965788   0.018620888
+ 512     1.2561196     1.3839965     1.3214632     1.3209327   0.019443971
Difference at 95.0% confidence
        0.0243538 +/- 0.00233202
        1.87831% +/- 0.179859%
        (Student's t, pooled s = 0.0190369)
```

[1] https://discourse.llvm.org/t/improving-the-reproducibility-of-linker-benchmarking/86057

Reviewers: zmodem, MaskRay

Reviewed By: MaskRay

Pull Request: https://github.com/llvm/llvm-project/pull/145579
2025-06-24 22:16:18 -07:00

95 lines
3.6 KiB
ReStructuredText

===========================
lld |release| Release Notes
===========================
.. contents::
:local:
.. only:: PreRelease
.. warning::
These are in-progress notes for the upcoming LLVM |release| release.
Release notes for previous releases can be found on
`the Download Page <https://releases.llvm.org/download.html>`_.
Introduction
============
This document contains the release notes for the lld linker, release |release|.
Here we describe the status of lld, including major improvements
from the previous release. All lld releases may be downloaded
from the `LLVM releases web site <https://llvm.org/releases/>`_.
Non-comprehensive list of changes in this release
=================================================
ELF Improvements
----------------
* Added ``-z dynamic-undefined-weak`` to make undefined weak symbols dynamic
when the dynamic symbol table is present.
(`#143831 <https://github.com/llvm/llvm-project/pull/143831>`_)
* For AArch64, added support for ``-zgcs-report-dynamic``, enabling checks for
GNU GCS Attribute Flags in Dynamic Objects when GCS is enabled. Inherits value
from ``-zgcs-report`` (capped at ``warning`` level) unless user-defined,
ensuring compatibility with GNU ld linker.
* The default Hexagon architecture version in ELF object files produced by
lld is changed to v68. This change is only effective when the version is
not provided in the command line by the user and cannot be inferred from
inputs.
* ``--why-live=<glob>`` prints for each symbol matching ``<glob>`` a chain of
items that kept it live during garbage collection. This is inspired by the
Mach-O LLD feature of the same name.
* Linker script ``OVERLAY`` descriptions now support virtual memory regions
(e.g. ``>region``) and ``NOCROSSREFS``.
* Added ``--xosegment`` and ``--no-xosegment`` flags to control whether to place
executable-only and readable-executable sections in the same segment. The
default value is ``--no-xosegment``.
(`#132412 <https://github.com/llvm/llvm-project/pull/132412>`_)
* For AArch64, added support for the ``SHF_AARCH64_PURECODE`` section flag,
which indicates that the section only contains program code and no data.
An output section will only have this flag set if all input sections also
have it set. (`#125689 <https://github.com/llvm/llvm-project/pull/125689>`_,
`#134798 <https://github.com/llvm/llvm-project/pull/134798>`_)
* For AArch64 and ARM, added ``-zexecute-only-report``, which checks for
missing ``SHF_AARCH64_PURECODE`` and ``SHF_ARM_PURECODE`` section flags
on executable sections.
(`#128883 <https://github.com/llvm/llvm-project/pull/128883>`_)
* For AArch64 and X86_64, added ``--branch-to-branch``, which rewrites branches
that point to another branch instruction to instead branch directly to the
target of the second instruction. Enabled by default at ``-O2``.
Breaking changes
----------------
* Executable-only and readable-executable sections are now allowed to be placed
in the same segment by default. Pass ``--xosegment`` to lld in order to get
the old behavior back.
* When using ``--no-pie`` without a ``SECTIONS`` command, the linker uses the
target's default image base. If ``-Ttext=`` or ``--section-start`` specifies
an output section address below this base, there will now be an error.
``--image-base`` can be set at a lower address to fix the error.
(`#140187 <https://github.com/llvm/llvm-project/pull/140187>`_)
COFF Improvements
-----------------
MinGW Improvements
------------------
MachO Improvements
------------------
WebAssembly Improvements
------------------------
Fixes
#####