`clang -g -gpubnames -fdebug-types-section` now emits .debug_names
section with references to local type unit entries defined in COMDAT
.debug_info sections.
```
.section .debug_info,"G",@progbits,5657452045627120676,comdat
.Ltu_begin0:
...
.section .debug_names,"",@progbits
...
// DWARF32
.long .Ltu_begin0 # Type unit 0
// DWARF64
// .long .Ltu_begin0 # Type unit 0
```
When `.Ltu_begin0` is relative to a non-prevailing .debug_info section,
the relocation resolves to 0, which is a valid offset within the
.debug_info section.
```
cat > a.cc <<e
struct A { int x; };
inline A foo() { return {1}; }
int main() { foo(); }
e
cat > b.cc <<e
struct A { int x; };
inline A foo() { return {1}; }
void use() { foo(); }
e
clang++ -g -gpubnames -fdebug-types-section -fuse-ld=lld a.cc b.cc -o old
```
```
% llvm-dwarfdump old
...
Local Type Unit offsets [
LocalTU[0]: 0x00000000
]
...
Local Type Unit offsets [
LocalTU[0]: 0x00000000 // indistinguishable from a valid offset within .debug_info
]
```
https://dwarfstd.org/issues/231013.1.html proposes that we use a
tombstone value instead to inform consumers. This patch implements the
idea. The second LocalTU entry will now use 0xffffffff.
https://reviews.llvm.org/D84825 has a TODO that we should switch the
tombstone value for most `.debug_*` sections to UINT64_MAX. We have
postponed the change for more than three years for consumers to migrate.
At some point we shall make the change, so that .debug_names is no long
different from other debug section that is not .debug_loc/.debug_ranges.
Co-authored-by: Alexander Yermolovich <ayermolo@meta.com>
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
Copy relocations and canonical PLT entries are for symbols defined in a
DSO. Currently we create them even for a `Defined`, possibly leading to
an output that won't work at run-time (e.g. R_X86_64_JUMP_SLOT
referencing a null symbol).
```
% cat a.s
.globl _start, main
.type main, @function
_start: main: ret
.rodata
.quad main
% clang -fuse-ld=lld -pie -nostdlib a.s
% readelf -Wr a.out
Relocation section '.rela.plt' at offset 0x290 contains 1 entry:
Offset Info Type Symbol's Value Symbol's Name + Addend
00000000000033b8 0000000000000007 R_X86_64_JUMP_SLOT 12b0
```
Report an error instead for the default `-z text` mode. GNU ld reports
an error in `-z text` mode as well.
relocateNonAlloc is costly for .debug_* section relocating. We don't
want to burn CPU cycles on other targets' workarounds.
Remove a temporary workaround for Linux objtool after a proper fix
https://git.kernel.org/linus/b8ec60e1186cdcfce41e7db4c827cb107e459002
Move the R_386_GOTPC workaround for GCC<8 beside the R_PC workaround.
This fixes a mis-link when mixing compressed and non-compressed input to
LLD. When relaxing calls, we must respect the source file that the
section came from when deciding whether it's legal to use compressed
instructions. If the call in question comes from a non-rvc source, then it will not
expect 2-byte alignments and cascading failures may result.
This fixes https://github.com/llvm/llvm-project/issues/63964. The symptom
seen there is that a latter RISCV_ALIGN can't be satisfied and we either
fail an assert or produce a totally bogus link result. (It can be easily
reproduced by putting .p2align 5 right before the nop in the reduced
test case and running check-lld on an assertions enabled build.) However,
it's important to note this is just one possible symptom of the problem.
If the resulting binary has a runtime switch between rvc and non-rvc
routines (via e.g. ifuncs), then even if we manage to link we may execute invalid
instructions on a machine which doesn't implement compressed instructions.
For a label difference like `.uleb128 A-B`, MC generates a pair of
R_RISCV_SET_ULEB128/R_RISCV_SUB_ULEB128 if A-B cannot be folded as a
constant. GNU assembler generates a pair of relocations in more cases
(when A or B is in a code section with linker relaxation).
`.uleb128 A-B` is primarily used by DWARF v5
.debug_loclists/.debug_rnglists (DW_LLE_offset_pair/DW_RLE_offset_pair
entry kinds) implemented in Clang and GCC.
`.uleb128 A-B` can be used in SHF_ALLOC sections as well (e.g.
`.gcc_except_table`). This patch does not handle SHF_ALLOC.
`-z dead-reloc-in-nonalloc=` can be used to change the relocated value,
if the R_RISCV_SET_ULEB128 symbol is in a discarded section. We don't
check the R_RISCV_SUB_ULEB128 symbol since for the expected cases A and
B should be defined in the same input section.
The two fields are similar.
`versionId` is the Verdef index in the output file. It is set for
`--exclude-id=`, version script patterns, and `sym@ver` symbols.
`verdefIndex` is the Verdef index of a Sharedfile (SharedSymbol or a
copy-relocated Defined), the default value -1 is also used to indicate
that the symbol has not been matched by a version script pattern
(https://reviews.llvm.org/D65716).
It seems confusing to have two fields. Merge them so that we can
allocate one bit for #70130 (suppress --no-allow-shlib-undefined
error in the presence of a DSO definition).
The R_LARCH_{ADD,SUB}6 relocation type are usually used by DwarfCFA to
calculate a tiny offset. They appear after binutils 2.41, with GAS
enabling relaxation by default.
The two fields are similar.
`versionId` is the Verdef index in the output file. It is set for
version script patterns and `sym@ver` symbols.
`verdefIndex` is the Verdef index of a SharedSymbol. The default value
-1 is also used to indicate that the symbol has not been matched by a
version script pattern (https://reviews.llvm.org/D65716).
It seems confusing to have two fields. Merge them so that we can
allocate one bit for #70130 (suppress --no-allow-shlib-undefined
error in the presence of a DSO definition).
It seems that some functions (.text.unlikely.xxx) may have zero size,
which
makes some builds with enabled assertions fail. Removing the assertion
and
extending one test to fix the build.
The sorting can process such zero-sized functions so no changes there
are needed
Edited lld/ELF/Options.td to cdsort as well
CDSort function reordering outperforms the existing default heuristic (
hfsort/C^3) in terms of the performance of generated binaries while
being (almost) as fast. Thus, the suggestion is to change the default.
The speedup is up to 1.5% perf for large front-end binaries, and can be
moderate/neutral for "small" benchmarks.
High-level **perf impact** on two selected binaries:
clang-10 binary (built with LTO+AutoFDO/CSSPGO): wins on top of C^3 in
[0.3%..0.8%]
rocksDB-8 binary (built with LTO+CSSPGO): wins on top of C^3 in
[0.8%..1.5%]
More detailed measurements on the clang binary is at
[here](https://reviews.llvm.org/D152834#4445042)
The undefined symbol message suggests the source line when line number
information is available (see https://reviews.llvm.org/D31481).
When the undefined symbol is from a global variable, we won't get the
line information.
```
extern int undef;
namespace ns {
int *var[] = {
&undef
};
// DW_TAG_variable(DW_AT_decl_file/DW_AT_decl_line) is available while
// line number information is unavailable.
}
ld.lld: error: undefined symbol: undef
>>> referenced by undef-debug2.cc
>>> undef-debug2.o:(ns::var)
```
This patch utilizes `getEnclosingSymbol` to locate `var` and find
DW_TAG_variable for `var`:
```
ld.lld: error: undefined symbol: undef
>>> referenced by undef-debug2.cc:3 (/tmp/c/undef-debug2.cc:3)
>>> undef-debug2.o:(ns::var)
```
For a DSO with all DT_NEEDED entries accounted for, if it contains an
undefined non-weak symbol that shares a name with a non-exported
definition (hidden visibility or localized by a version script), and
there is no DSO definition, we should also report an error. Because the
definition is not exported, it cannot resolve the DSO reference at
runtime.
GNU ld introduced this error-checking in [April
2003](https://sourceware.org/pipermail/binutils/2003-April/026568.html).
The feature is available for executable links but not for -shared, and
it is
orthogonal to --no-allow-shlib-undefined. We make the feature part of
--no-allow-shlib-undefined and work with -shared when
--no-allow-shlib-undefined is specified.
A subset of this error-checking is covered by commit
1981b1b6b9 for --gc-sections discarded
sections. This patch covers non-discarded sections as well.
Internally, I have identified 2 bugs (which would fail with
LD_BIND_NOW=1) covered by commit
1981b1b6b9
On Windows when zlib is enabled, zlib header introduced some Windows
headers which defines max as a macro. Since OutputSections.cpp uses
std::max with template argument, this causes compilation error.
Define macro NOMINMAX to avoid this.
For an output section with no input section, GNU ld eliminates the
output section when there are only symbol assignments (e.g.
`.foo : { symbol = 42; }`) but not for `.foo : { . += 42; }`
(`SHF_ALLOC|SHF_WRITE`).
We choose to retain such an output section with a symbol assignment
(unless unreferenced `PROVIDE`). We copy the previous section flag (see
https://reviews.llvm.org/D37736) to hopefully make the current PT_LOAD
segment extend to the current output section:
* decrease the number of PT_LOAD segments
* If a new PT_LOAD segment is introduced without a page-size
alignment as a separator, there may be a run-time crash.
However, this `flags` copying behavior is not suitable for
`.foo : { . += 42; }` when `flags` contains `SHF_EXECINSTR`. The
executable bit is surprising
(https://discourse.llvm.org/t/lld-output-section-flag-assignment-behavior/74359).
I think we should drop SHF_EXECINSTR when copying `flags`. The risk is a
code section followed by `.foo : { symbol = 42; }` will be broken, which
I believe is unrelated as such uses are almost always related to data
sections.
For data-command-only output sections (e.g. `.foo : { QUAD(42) }`), we
keep allowing copyable SHF_WRITE.
Some tests are updated to drop the SHF_EXECINSTR flag. GNU ld doesn't
set SHF_EXECINSTR as well, though it sets SHF_WRITE for some tests while
we don't.
Follow-up to #69295: In `Writer<ELFT>::run`, the symbol passes are
flexible:
they can be placed almost everywhere before `scanRelocations`, with a
constraint
that the `computeIsPreemptible` pass must be invoked for linker-defined
non-local symbols.
Merge copyLocalSymbols and demoteLocalSymbolsInDiscardedSections to
simplify
code:
* Demoting local symbols can be made unconditional, not constrainted to
/DISCARD/ uses due to performance concerns
* `includeInSymtab` can be made faster
* Make symbol passes close to each other
* Decrease data cache misses due to saving an iteration over local
symbols
There is no speedup, likely due to the unconditional `dr->section`
access in `demoteAndCopyLocalSymbols`.
`gc-sections-tls.s` no longer reports an error because the TLS symbol is
converted to an Undefined.
One use of setSymbolAndType (related to https://reviews.llvm.org/D53864
"Do not crash when -r output uses linker script with `/DISCARD/`")
is no longer needed after commit 1981b1b6b9
demotes symbols in discarded sections to Undefined.
When an input section is matched by /DISCARD/ in a linker script, GNU ld
reports errors for relocations referencing symbols defined in the section:
`.aaa' referenced in section `.bbb' of a.o: defined in discarded section `.aaa' of a.o
Implement the error by demoting eligible symbols to `Undefined` and changing
STB_WEAK to STB_GLOBAL. As a side benefit, in relocatable links, relocations
referencing symbols defined relative to /DISCARD/ discarded sections no longer
set symbol/type to zeros.
It's arguable whether a weak reference to a discarded symbol should lead to
errors. GNU ld reports an error and our demoting approach reports an error as
well.
Close#58891
Co-authored-by: Bevin Hansson <bevin.hansson@ericsson.com>
History of demoteSharedSymbols:
* https://reviews.llvm.org/D45536 demotes SharedSymbol
* https://reviews.llvm.org/D111365 demotes lazy symbols
* The pending #69295 will demote symbols defined in discarded sections
The pass is placed after markLive just to be clear that it needs `isNeeded`
information computed by markLive. The remaining passes in Driver.cpp do not use
symbol information. Move the pass to Writer.cpp to be closer to other
symbol-related passes.
Note that llvm::support::endianness has been renamed to
llvm::endianness while becoming an enum class as opposed to an
enum. This patch replaces support::{big,little,native} with
llvm::endianness::{big,little,native}.
Note that llvm::support::endianness has been renamed to
llvm::endianness while becoming an enum class as opposed to an enum.
This patch replaces llvm::support::{big,little,native} with
llvm::endianness::{big,little,native}.
CDSort function reordering outperforms the existing default heuristic (
hfsort/C^3) in terms of the performance of generated binaries while
being (almost) as fast. Thus, the suggestion is to change the default.
The speedup is up to 1.5% perf for large front-end binaries, and can be
moderate/neutral for "small" benchmarks.
High-level **perf impact** on two selected binaries:
clang-10 binary (built with LTO+AutoFDO/CSSPGO): wins on top of C^3 in
[0.3%..0.8%]
rocksDB-8 binary (built with LTO+CSSPGO): wins on top of C^3 in
[0.8%..1.5%]
More detailed measurements on the clang binary is at
[here](https://reviews.llvm.org/D152834#4445042)
Integrating MTE globals on Android revealed a lot of cases where
libraries are built as both archives and DSOs, and they're linked into
fully static and dynamic executables respectively.
MTE globals doesn't work for fully static executables. They need a
dynamic loader to process the special R_AARCH64_RELATIVE relocation
semantics with the encoded offset. Fully static executables that had
out-of-bounds derived symbols (like 'int* foo_end = foo[16]') crash
under MTE globals w/ static executables. So, LLD in its current form
simply errors out when you try and compile a fully static executable
that has a single MTE global variable in it.
It seems like a much better idea to simply have LLD not do the special
work for MTE globals in fully static contexts, and to drop any
unnecessary metadata. This means that you can build archives with MTE
globals and link them into both fully-static and dynamic executables.
Now that llvm::support::endianness has been renamed to
llvm::endianness, we can use the shorter form. This patch replaces
llvm::support::endianness with llvm::endianness.
Some large AVR programs (for devices without long jump) may exceed
128KiB, and lld should give explicit errors other than generate wrong
executables silently.
For each R_X86_64_(REX_)GOTPCRELX relocation, check that the offset to the symbol is representable with 2^32 signed offset. If not, add a GOT entry for it and set its expr to R_GOT_PC so that we emit the GOT load instead of the relaxed lea. Do this in finalizeAddressDependentContent() where we iteratively attempt this (e.g. RISCV uses this for relaxation, ARM uses this to insert thunks).
Decided not to do the opposite of inserting GOT entries initially and removing them when relaxable because removing GOT entries isn't simple.
One drawback of this approach is that if we see any GOTPCRELX relocation, we'll create an empty .got even if it's not required in the end.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D157020
Before this patch, with MSVC I was seeing:
```
[304/334] Building CXX object tools\lld\ELF\CMakeFiles\lldELF.dir\InputFiles.cpp.obj
C:\git\llvm-project\lld\ELF\InputFiles.h(327): warning C4661: 'void lld::elf::ObjFile<llvm::object::ELF32LE>::importCmseSymbols(void)': no suitable definition provided for explicit template instantiation request
C:\git\llvm-project\lld\ELF\InputFiles.h(291): note: see declaration of 'lld::elf::ObjFile<llvm::object::ELF32LE>::importCmseSymbols'
C:\git\llvm-project\lld\ELF\InputFiles.h(327): warning C4661: 'void lld::elf::ObjFile<llvm::object::ELF32LE>::redirectCmseSymbols(void)': no suitable definition provided for explicit template instantiation request
C:\git\llvm-project\lld\ELF\InputFiles.h(292): note: see declaration of 'lld::elf::ObjFile<llvm::object::ELF32LE>::redirectCmseSymbols'
```
This patch removes `redirectCmseSymbols` which is not defined. And it
imports `importCmseSymbols` in InputFiles.cpp, because it is already
explicitly instantiated in ARM.cpp.
When the .eh_frame section is placed at a non-zero
offset within its output section, the relocation
value within .eh_frame are computed incorrectly.
We had similar issue in .ARM.exidx section and it has been
fixed already in https://reviews.llvm.org/D148033.
While applying the relocation using S+A-P, the value
of P (the location of the relocation) is getting wrong.
P is:
P = SecAddr + rel.offset, But SecAddr points to the
starting address of the outputsection rather than the
starting address of the eh frame section within that
output section.
This issue affects all targets which generates .eh_frame
section. Hence fixing in all the corresponding targets it affecting.
We are brining a new algorithm for function layout (reordering) based on the
call graph (extracted from a profile data). The algorithm is an improvement of
top of a known heuristic, C^3. It tries to co-locate hot and frequently executed
together functions in the resulting ordering. Unlike C^3, it explores a larger
search space and have an objective closely tied to the performance of
instruction and i-TLB caches. Hence, the name CDS = Cache-Directed Sort.
The algorithm can be used at the linking or post-linking (e.g., BOLT) stage.
Refer to https://reviews.llvm.org/D152834 for the actual implementation of the
reordering algorithm.
This diff adds a linker option to replace the existing C^3 heuristic with CDS.
The new behavior can be turned on by passing "--use-cache-directed-sort".
(the plan is to make it default in a next diff)
**Perf-impact**
clang-10 binary (built with LTO+AutoFDO/CSSPGO): wins on top of C^3 in [0.3%..0.8%]
rocksDB-8 binary (built with LTO+CSSPGO): wins on top of C^3 in [0.8%..1.5%]
Note that function layout affects the perf the most on older machines (with
smaller instruction/iTLB caches) and when huge pages are not enabled. The impact
on newer processors with huge pages enabled is likely neutral/minor.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D152840
Change the FF form --call-graph-profile-sort to --call-graph-profile-sort={none,hfsort}.
This will be extended to support llvm/lib/Transforms/Utils/CodeLayout.cpp.
--call-graph-profile-sort is not used in the wild but
--no-call-graph-profile-sort is (Chromium). Make --no-call-graph-profile-sort an
alias for --call-graph-profile-sort=none.
Reviewed By: rahmanl
Differential Revision: https://reviews.llvm.org/D159544
https://reviews.llvm.org/D48929 updated addends for non-SHF_ALLOC sections
relocated by REL for -r links, but the patch did not update the addends when
--compress-debug-sections={zlib,zstd} is used (#66738).
https://reviews.llvm.org/D116946 handled tombstone values in debug
sections in relocatable links. As a side effect, both
relocateNonAllocForRelocatable (using `sec->relocations`) and
relocatenonNonAlloc (using raw REL/RELA) may run.
Actually, we can adjust the condition in relocatenonAlloc to completely replace
relocateNonAllocForRelocatable. This patch implements this idea and fixes#66738.
As relocateNonAlloc processes the raw relocations like copyRelocations() does,
the condition `if (config->relocatable && type != target.noneRel)` in `copyRelocations`
(commit 08d6a3f133, modified by https://reviews.llvm.org/D62052)
can be made specific to SHF_ALLOC sections.
As a side effect, we can now report diagnostics for PC-relative relocations for
-r. This is a less useful diagnostic that is not worth too much code. As
https://github.com/ClangBuiltLinux/linux/issues/1937 has violations, just
suppress the warning for -r. Tested by commit 561b98f9e0.
The size of .ARM.exidx may shrink across `assignAddress` calls. It is
possible
that the initial iteration has a larger location counter, causing
`__code_size =
__code_end - .; osec : { . += __code_size; }` to report an error, while
the error would
have been suppressed for subsequent `assignAddress` iterations.
Other sections like .relr.dyn may change sizes across `assignAddress`
calls as
well. However, their initial size is zero, so it is difficiult to
trigger a
similar error.
Similar to https://reviews.llvm.org/D152170, postpone the error
reporting.
Fix#66836. While here, add more information to the error message.
Discussion about this approach: https://discourse.llvm.org/t/rfc-safer-whole-program-class-hierarchy-analysis/65144/18
When enabling WPD in an environment where native binaries are present, types we want to optimize can be derived from inside these native files and devirtualizing them can lead to correctness issues. RTTI can be used as a way to determine all such types in native files and exclude them from WPD providing a safe checked way to enable WPD.
The approach is:
1. In the linker, identify if RTTI is available for all native types. If not, under `--lto-validate-all-vtables-have-type-infos` `--lto-whole-program-visibility` is automatically disabled. This is done by examining all .symtab symbols in object files and .dynsym symbols in DSOs for vtable (_ZTV) and typeinfo (_ZTI) symbols and ensuring there's always a match for every vtable symbol.
2. During thinlink, if `--lto-validate-all-vtables-have-type-infos` is set and RTTI is available for all native types, identify all typename (_ZTS) symbols via their corresponding typeinfo (_ZTI) symbols that are used natively or outside of our summary and exclude them from WPD.
Testing:
ninja check-all
large Meta service that uses boost, glog and libstdc++.so runs successfully with WPD via --lto-whole-program-visibility. Previously, native types in boost caused incorrect devirtualization that led to crashes.
Reviewed By: MaskRay, tejohnson
Differential Revision: https://reviews.llvm.org/D155659