These mismatched data layouts are exposed by the refactoring in
https://github.com/llvm/llvm-project/pull/105734 that now also
compares the non-integral pointers list of the data layout.
I dropped the explicit definition from all tests that use opt/llc since
those tools will insert the correct value, but for the llvm-as based
tests and explicit layout is required.
Reviewed By: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/107276
On ARM64EC, a function symbol may appear in both mangled and demangled
forms:
- ARM64EC archives contain only the mangled name, while the demangled
symbol is defined by the object file as an alias.
- x86_64 archives contain only the demangled name (the mangled name is
usually defined by an object referencing the symbol as an alias to a
guess exit thunk).
- ARM64EC import files contain both the mangled and demangled names for
thunks.
If more than one archive defines the same function, this could lead to
different libraries being used for the same function depending on how
they are referenced. Avoid this by checking if the paired symbol is
already defined before adding a symbol to the table.
On ARM64EC, external function calls emit a pair of weak-dependency
aliases: `func` to `#func` and `#func` to the `func` guess exit thunk
(instead of a single undefined `func` symbol, which would be emitted on
other targets). Allow such aliases to be overridden by lazy archive
symbols, just as we would for undefined symbols.
Autogenerate `.ll` code from cpp code in some `-icf-safe-thunk` tests
using `update_test_body.py`
```
PATH=build/bin:$PATH llvm/utils/update_test_body.py lld/test/MachO/icf-safe-thunks.ll lld/test/MachO/icf-safe-thunks-dwarf.ll
```
https://llvm.org/docs/TestingGuide.html#elaborated-tests
I recently became aware of this tool and I wanted to practice using it.
This also allows to remove the custom instructions to generate the `.ll`
code.
Co-authored-by: Billy Laws <blaws05@gmail.com>
Anti-dependency symbols are allowed to be duplicated, with the first
definition taking precedence. If a regular weak alias is present, it is
preferred over an anti-dependency definition. Chaining anti-dependencies
is not allowed.
We replace sed with awk as I couldn't find a syntax that works
consistently on Linux/Mac for sed.
Repro'ed original issue on Mac and confirmed working now on Mac/Linux.
Fix 'sed' spacing to ensure compatibility with all platforms.
Original failure:
https://lab.llvm.org/buildbot/#/builders/190/builds/7903
```
RUN: at line 33: sed -E '/^__OBJC_\$_CATEGORY_MyBaseClass_\$_Category01:/ { n; s/^[ \t]*\.quad[ \t]+l_OBJC_CLASS_NAME_$/\t.quad\tL_OBJC_IMAGE_INFO+3/ }' merge_cat_minimal.s > merge_cat_minimal_bad_name.s
+ sed -E '/^__OBJC_\$_CATEGORY_MyBaseClass_\$_Category01:/ { n; s/^[ \t]*\.quad[ \t]+l_OBJC_CLASS_NAME_$/\t.quad\tL_OBJC_IMAGE_INFO+3/ }' merge_cat_minimal.s
sed: 1: "/^__OBJC_\$_CATEGORY_My ...": bad flag in substitute command: '}'
```
This patch enhances the robustness of lld's Objective-C category
merging. Currently, the category merger assumes it can fully parse and
understand the format of all categories in the input, triggering an
assert if any invalid category data is encountered.
This will end up causing asserts in certain rare corner cases that are
difficult to reproduce in small test cases. The proposed changes modify
the behavior so that if invalid category data is detected, category
merging is skipped for that specific class and all other categories
sharing the same base class. This approach allows the linker to continue
processing other categories without failing entirely due to a single
problematic input.
We also add a LIT test to where we corrupt category data and check that
category merging for that class was skipped but the link was successful.
The assumption that a symbol is either `Defined` or `Undefined` is not
always true for some cases. For example, `mangleMaybe` may create a weak
alias to a lazy archive symbol.
The inaccurate #111945 condition fixes a PROVIDE regression (#111478)
but introduces another regression: in a DSO link, if a symbol referenced
only by bitcode files is defined as PROVIDE_HIDDEN, lld would not set
the visibility correctly, leading to an assertion failure in
DynamicReloc::getSymIndex (https://reviews.llvm.org/D123985).
This is because `(sym->isUsedInRegularObj || sym->exportDynamic)` is
initially false (bitcode undef does not set `isUsedInRegularObj`) then
true (in `addSymbol`, after LTO compilation).
Fix this by making the condition accurate: use a map to track defined
symbols.
Reviewers: smithp35
Reviewed By: smithp35
Pull Request: https://github.com/llvm/llvm-project/pull/112386
Currently, WebAssembly/WASI target does not provide direct support for
code coverage.
This patch set fixes several issues to unlock the feature. The main
changes are:
1. Port `compiler-rt/lib/profile` to WebAssembly/WASI.
2. Adjust profile metadata sections for Wasm object file format.
- [CodeGen] Emit `__llvm_covmap` and `__llvm_covfun` as custom sections
instead of data segments.
- [lld] Align the interval space of custom sections at link time.
- [llvm-cov] Copy misaligned custom section data if the start address is
not aligned.
- [llvm-cov] Read `__llvm_prf_names` from data segments
3. [clang] Link with profile runtime libraries if requested
See each commit message for more details and rationale.
This is part of the effort to add code coverage support in Wasm target
of Swift toolchain.
When encountering an instruction like `if (p0) r0 = add(r0,##bar@GOT)`,
lld would fail with:
```
ld.lld: error: unrecognized instruction for 16_X type: 0x7400C000
```
This issue was encountered while building libreadline with clang 19.1.0.
Fixes: #111876
Case: `PROVIDE(f1 = bar);` when both `f1` and `bar` are in separate
sections that would be discarded by GC.
Due to `demoteDefined`, `shouldAddProvideSym(f1)` may initially return
false (when Defined) and then return true (been demoted to Undefined).
```
addScriptReferencedSymbolsToSymTable
shouldAddProvideSym(f1): false
// the RHS (bar) is not added to `referencedSymbols` and may be GCed
declareSymbols
shouldAddProvideSym(f1): false
markLive
demoteSymbolsAndComputeIsPreemptible
// demoted f1 to Undefined
processSymbolAssignments
addSymbol
shouldAddProvideSym(f1): true
```
The inconsistency can cause `cmd->expression()` in `addSymbol` to be
evaluated, leading to `symbol not found: bar` errors (since `bar` in the
RHS is not in `referencedSymbols` and is GCed) (#111478).
Fix this by adding a `sym->isUsedInRegularObj` condition, making
`shouldAddProvideSym(f1)` values consistent. In addition, we need a
`sym->exportDynamic` condition to keep provide-shared.s working.
Fixes: ebb326a51f
Pull Request: https://github.com/llvm/llvm-project/pull/111945
In `--icf=safe_thunks` mode, the linker differentiates `keepUnique`
functions by creating thunks during a post-processing step after
Identical Code Folding (ICF). While this ensures that `keepUnique`
functions themselves are not incorrectly merged, it overlooks functions
that reference these `keepUnique` symbols.
If two functions are identical except for references to different
`keepUnique` functions, the current ICF algorithm incorrectly considers
them identical because it doesn't account for the future differentiation
introduced by thunks. This leads to incorrect deduplication of functions
that should remain distinct.
To address this issue, we modify the ICF comparison to explicitly check
for references to `keepUnique` functions during deduplication. By doing
so, functions that reference different `keepUnique` symbols are
correctly identified as distinct, preventing erroneous merging and
ensuring the correctness of the linked output.
The RISC-V psABI states that "The `R_RISCV_PCREL_LO12_I` or
`R_RISCV_PCREL_LO12_S` relocations contain a label pointing to an
instruction in the same section with an `R_RISCV_PCREL_HI20` relocation
entry that points to the target symbol."
Without this patch, GNU ld errors, but LLD does not -- I think because LLD is
doing the right thing, certainly in the testcase provided.
Nonetheless, I think an error is good here to bring LLD in line with
what GNU ld is doing in showing that the object the user provided is not
following the psABI as written.
Fixes#107304
We've noticed that for large builds executing thin-link can take on the
order of 10s of minutes. We are only using a single thread to write the
sharded indices and import files for each input bitcode file. While we
need to ensure the index file produced lists modules in a deterministic
order, that doesn't prevent us from executing the rest of the work in
parallel.
In this change we use a thread pool to execute as much of the backend's
work as possible in parallel. In local testing on a machine with 80
cores, this change makes a thin-link for ~100,000 input files run in ~2
minutes. Without this change it takes upwards of 10 minutes.
---------
Co-authored-by: Nuri Amari <nuriamari@fb.com>
There is a bug in the current implementation of `--icf=safe_thunks`
where a STABS entry is emitted for generated thunks. This is problematic
as we end up generating invalid DWARF as dsymutil will think the entire
function body is at the thunk location, when in actuality there will
only be a single branch present. This will end up causing overlapping
DWARF entries.
To fix this we never generate STABS entries for such thunks.
The existing `--icf=safe_thunks` test is updated to also generate debug
info and we add a check that no corrupt DWARF is generated.
As a future TODO we need to make `--keep-icf-stabs` compatible with
`--icf=safe_thunks`.
__arm64x_native_entrypoint and __guard_check_icall_a64n_fptr are
relevant only for hybrid ARM64X images, we need support for separate
namespaces before we can support them.
__hybrid_image_info_bitfield is 0 in MSVC linker in all tests I tried.
When Branch Target Identification BTI is enabled all indirect branches
must target a BTI instruction. A long branch thunk is a source of
indirect branches. To date LLD has been assuming that the object
producer is responsible for putting a BTI instruction at all places the
linker might generate an indirect branch to. This is true for clang, but
not for GCC. GCC will elide the BTI instruction when it can prove that
there are no indirect branches from outside the translation unit(s). GNU
ld was fixed to generate a landing pad stub (gnu ld speak for thunk) for
the destination when a long range stub was needed [1].
This means that using GCC compiled objects with LLD may lead to LLD
generating an indirect branch to a location without a BTI. The ABI [2]
has also been clarified to say that it is a static linker's
responsibility to generate a landing pad when the target does not have a
BTI.
This patch implements the same mechansim as GNU ld. When the output ELF
file is setting the
GNU_PROPERTY_AARCH64_FEATURE_1_BTI property, then we check the
destination to see if it has a BTI instruction. If it does not we
generate a landing pad consisting of:
BTI c
B <destination>
The B <destination> can be elided if the thunk can be placed so that
control flow drops through. For example:
BTI c
<destination>:
This will be common when -ffunction-sections is used.
The landing pad thunks are effectively alternative entry points for the
function. Direct branches are unaffected but any linker generated
indirect branch needs to use the alternative. We place these as close as
possible to the destination section.
There is some further optimization possible. Consider the case:
.text
fn1
...
fn2
...
If we need landing pad thunks for both fn1 and fn2 we could order them
so that the thunk for fn1 immediately precedes fn1. This could save a
single branch. However I didn't think that would be worth the additional
complexity.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671
[2] https://github.com/ARM-software/abi-aa/issues/196
Instead of always generating __wasm_apply_data_relocs when relevant
options like -pie and -shared are specified, generate it only when the
relevant relocations are actually necessary.
Note: omitting empty __wasm_apply_data_relocs is not a problem because
the export is optional in the spec (DynamicLinking.md) and all runtime
linker implementations I'm aware of implement it that way. (emscripten,
toywasm, wasm-tools)
Motivations:
* This possibly reduces the module size
* This is also a preparation to fix
https://github.com/llvm/llvm-project/issues/107387, for which it isn't
obvious if we need these relocations at the time of
createSyntheticSymbols. (unless we introduce a new explicit option like
--non-pie-dynamic-link.)
Fill the regular delay-load IAT with x86_64 delay-load thunks. Similarly
to regular imports, create an auxiliary IAT and its copy for ARM64EC
calls. These are filled with the same `__impchk_` thunks used for
regular imports, which perform an indirect call with
`__icall_helper_arm64ec` on the regular delay-load IAT. These auxiliary
IATs are exposed via CHPE metadata starting from version 2.
The MSVC linker creates one more copy of the auxiliary IAT. `__imp_func`
symbols refer to that hidden IAT, while the `#func` thunk performs a
call with the public auxiliary IAT. If the public auxiliary IAT is fine
for `#func`, it should be fine for calls using the `__imp_func` symbol
as well. Therefore, I made `__imp_func` refer to that IAT too.
The MSVC linker generates range extensions for these thunks when needed.
This commit inlines the range extension into the thunk, making it both
slightly more optimal and easier to implement in LLD.
Swap `!DisassembleZeroes` and `if (DumpARMELFData)` conditions so that
in the false DisassembleZeroes case (default), `...` will be printed for
long consecutive zeroes, even when a data mapping symbol is active.
This is especially useful for certain lld tests that insert a huge
padding within a code section. Without `...` the output will be huge.
Pull Request: https://github.com/llvm/llvm-project/pull/109553
Followup to #104926.
We ran into issues on the emscripten waterfall where relocation against
`__dso_handle` were being reported as errors even though
`-r/--relocatable` was being used to generate object file output rather
than executable output.
The current logic assumes that the import file is pulled by object
files, and the loop for import files only needs to handle cases where
the `__imp_` symbol is implicitly pulled by an import thunk. This is
fragile, as the symbol may also be pulled through other means, such as
the -export argument in tests. Additionally, this logic is insufficient
for ARM64EC, which exposes multiple symbols through an import file, and
referencing any one of them causes all of them to be defined.
With this change, import symbols are added to `syms` more often, but we
ensure that output symbols remain unique later in the process
Similar to commit 686cff17cc for SHT_REL (#57693).
CREL hasn't been tested with ICF before.
And avoid a pitfall that eqClass[0] might interfere with ICF.
`WASM_MEMORY_ADDR_REL_` and `WASM_TABLE_INDEX_REL_` relocations against
**undefined symbols** are not supported and, except for
`UnresolvedPolicy::ReportError`, lead to incorrect Wasm code, such as
invalid data address or invalid table index that cannot be patched
during later dynamic Wasm linking with modules declaring those symbols.
This is different to other relocations that support undefined symbols by
declaring correspond Wasm imports.
For more robust behavior, `wasm-ld` should probably report an error for
such unsupported PIC relocations, independent of the `UnresolvedPolicy`.
symTab being a DenseMap, the order in which a symbol and its
corresponding import symbol are processed is not guaranteed, and when
the latter comes first, it is left undefined.
In addition to the auxiliary IAT, ARM64EC modules also contain a copy of
it. At runtime, the auxiliary IAT is filled with the addresses of actual
ARM64EC functions when possible. If patching is detected, the OS may use
the IAT copy to revert the auxiliary IAT, ensuring that the call checker
is used for calls to imported functions.
On ARM64EC, __imp_ symbols reference the auxiliary IAT, while __imp_aux_
symbols reference the regular IAT. However, x86_64 code expects both to
reference the regular IAT. This change adjusts the symbols accordingly,
matching the behavior observed in the MSVC linker.
When both CREL and the experimental lld partitions feature are enabled,
the relocation section may look like .crel.llvm_sympart.f1, and
`rels.relas` is empty. While here, support relocation sections with zero
entry.
In addition to the regular IAT, ARM64EC also includes an auxiliary IAT.
At runtime, the regular IAT is populated with the addresses of imported
functions, which may be x86_64 functions or the export thunks of ARM64EC
functions. The auxiliary IAT contains versions of functions that are
guaranteed to be directly callable by ARM64 code.
The linker fills the auxiliary IAT with the addresses of `__impchk_`
thunks. These thunks perform a call on the IAT address using
`__icall_helper_arm64ec` with the target address from the IAT. If the
imported function is an ARM64EC function, the OS may replace the address
in the auxiliary IAT with the address of the ARM64EC version of the
function (not its export thunk), avoiding the runtime call checker for
better performance.