Commit Graph

7236 Commits

Author SHA1 Message Date
Fangrui Song
26ddf4eee2 [ELF] Change .debug_names tombstone value to UINT32_MAX/UINT64_MAX (#74686)
`clang -g -gpubnames -fdebug-types-section` now emits .debug_names
section with references to local type unit entries defined in COMDAT
.debug_info sections.

```
.section        .debug_info,"G",@progbits,5657452045627120676,comdat
.Ltu_begin0:
...

.section        .debug_names,"",@progbits
...
// DWARF32
.long   .Ltu_begin0                     # Type unit 0
// DWARF64
// .long   .Ltu_begin0                     # Type unit 0
```

When `.Ltu_begin0` is relative to a non-prevailing .debug_info section,
the relocation resolves to 0, which is a valid offset within the
.debug_info section.

```
cat > a.cc <<e
struct A { int x; };
inline A foo() { return {1}; }
int main() { foo(); }
e
cat > b.cc <<e
struct A { int x; };
inline A foo() { return {1}; }
void use() { foo(); }
e
clang++ -g -gpubnames -fdebug-types-section -fuse-ld=lld a.cc b.cc -o old
```
```
% llvm-dwarfdump old
...
  Local Type Unit offsets [
    LocalTU[0]: 0x00000000
  ]
...
  Local Type Unit offsets [
    LocalTU[0]: 0x00000000  // indistinguishable from a valid offset within .debug_info
  ]
```

https://dwarfstd.org/issues/231013.1.html proposes that we use a
tombstone value instead to inform consumers. This patch implements the
idea. The second LocalTU entry will now use 0xffffffff.

https://reviews.llvm.org/D84825 has a TODO that we should switch the
tombstone value for most `.debug_*` sections to UINT64_MAX. We have
postponed the change for more than three years for consumers to migrate.
At some point we shall make the change, so that .debug_names is no long
different from other debug section that is not .debug_loc/.debug_ranges.

Co-authored-by: Alexander Yermolovich <ayermolo@meta.com>
2023-12-21 18:59:11 -08:00
Kazu Hirata
732bccb8c1 Use StringRef::{starts,ends}_with (NFC)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
2023-12-14 07:53:20 -08:00
Fangrui Song
42e4967140 [ELF] Don't create copy relocation/canonical PLT entry for a defined symbol (#75095)
Copy relocations and canonical PLT entries are for symbols defined in a
DSO. Currently we create them even for a `Defined`, possibly leading to
an output that won't work at run-time (e.g. R_X86_64_JUMP_SLOT
referencing a null symbol).
```
% cat a.s
.globl _start, main
.type main, @function
_start: main: ret

.rodata
.quad main
% clang -fuse-ld=lld -pie -nostdlib a.s
% readelf -Wr a.out

Relocation section '.rela.plt' at offset 0x290 contains 1 entry:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
00000000000033b8  0000000000000007 R_X86_64_JUMP_SLOT                        12b0
```

Report an error instead for the default `-z text` mode. GNU ld reports
an error in `-z text` mode as well.
2023-12-12 10:14:36 -08:00
Fangrui Song
3fd1d6953d [ELF] relocateNonAlloc: clean up workaround code
relocateNonAlloc is costly for .debug_* section relocating. We don't
want to burn CPU cycles on other targets' workarounds.

Remove a temporary workaround for Linux objtool after a proper fix
https://git.kernel.org/linus/b8ec60e1186cdcfce41e7db4c827cb107e459002

Move the R_386_GOTPC workaround for GCC<8 beside the R_PC workaround.
2023-12-07 12:43:40 -08:00
Fangrui Song
a4d4b45aef [ELF] relocateNonAlloc: move likely expr == R_ABS before unlikely R_SIZE. NFC 2023-12-07 12:10:42 -08:00
Fangrui Song
23d402e5b7 [ELF] IWYU <optional> NFC 2023-12-06 15:18:46 -08:00
Philip Reames
62213be872 [LLD][RISCV] Fix incorrect call relaxation when mixing +c and -c objects (#73977)
This fixes a mis-link when mixing compressed and non-compressed input to
LLD. When relaxing calls, we must respect the source file that the
section came from when deciding whether it's legal to use compressed
instructions. If the call in question comes from a non-rvc source, then it will not
expect 2-byte alignments and cascading failures may result.

This fixes https://github.com/llvm/llvm-project/issues/63964. The symptom 
seen there is that a latter RISCV_ALIGN can't be satisfied and we either
fail an assert or produce a totally bogus link result. (It can be easily
reproduced by putting .p2align 5 right before the nop in the reduced
test case and running check-lld on an assertions enabled build.)  However,
it's important to note this is just one possible symptom of the problem.

If the resulting binary has a runtime switch between rvc and non-rvc
routines (via e.g. ifuncs), then even if we manage to link we may execute invalid
instructions on a machine which doesn't implement compressed instructions.
2023-12-01 11:02:53 -08:00
Adrian Prantl
2c07181424 [LEB128] Don't initialize error on success
This change removes an unnecessary branch from a hot path. It's also
questionable API to override any previous error unconditonally.
2023-11-29 12:47:27 -08:00
Adrian Prantl
69b0cb9c56 Revert "[LEB128] Don't initialize error on success"
This reverts commit 545c8e009e.
2023-11-29 12:40:37 -08:00
Adrian Prantl
545c8e009e [LEB128] Don't initialize error on success
This change removes an unnecessary branch from a hot path. It's also
questionable API to override any previous error unconditonally.
2023-11-29 12:16:32 -08:00
Weining Lu
84a20989c6 [lld][LoongArch] Add a another corner testcase for elf::getLoongArchPageDelta
Similar to e752b58e0d.
2023-11-25 20:38:45 +08:00
Fangrui Song
7ffabb61a5 [ELF] Support R_RISCV_SET_ULEB128/R_RISCV_SUB_ULEB128 in non-SHF_ALLOC sections (#72610)
For a label difference like `.uleb128 A-B`, MC generates a pair of
R_RISCV_SET_ULEB128/R_RISCV_SUB_ULEB128 if A-B cannot be folded as a
constant. GNU assembler generates a pair of relocations in more cases
(when A or B is in a code section with linker relaxation).

`.uleb128 A-B` is primarily used by DWARF v5
.debug_loclists/.debug_rnglists (DW_LLE_offset_pair/DW_RLE_offset_pair
entry kinds) implemented in Clang and GCC.

`.uleb128 A-B` can be used in SHF_ALLOC sections as well (e.g.
`.gcc_except_table`). This patch does not handle SHF_ALLOC.

`-z dead-reloc-in-nonalloc=` can be used to change the relocated value,
if the R_RISCV_SET_ULEB128 symbol is in a discarded section. We don't
check the R_RISCV_SUB_ULEB128 symbol since for the expected cases A and
B should be defined in the same input section.
2023-11-21 07:43:29 -08:00
dong jianqiang
89f095d204 [lld][ELF] Add armeb support when incoming bc is arm big endian (#72604)
Add armeb support when incoming bc is arm big endian:
Fix error: could not infer e_machine from bitcode target triple
armebv7-linux-gnueabi.
2023-11-20 21:12:06 -08:00
Fangrui Song
b8dface221 [ELF] -r: rename orphan SHT_REL/SHT_RELA when the relocated input section is placed in an output section
This ports https://reviews.llvm.org/D40652 (--emit-relocs) to -r and
matches GNU ld.
Close #67910
2023-11-17 22:38:15 -08:00
Brad Smith
3a12001925 [lld][ELF] Recognize sparcv9 bitcode (#72609) 2023-11-17 16:16:20 -05:00
Fangrui Song
ae7fb21b5a [ELF] Make some InputSection/InputFile member functions const. NFC 2023-11-16 20:24:14 -08:00
Fangrui Song
255ea48608 [ELF] Merge verdefIndex into versionId. NFC (#72208)
The two fields are similar.

`versionId` is the Verdef index in the output file. It is set for
`--exclude-id=`, version script patterns, and `sym@ver` symbols.

`verdefIndex` is the Verdef index of a Sharedfile (SharedSymbol or a
copy-relocated Defined), the default value -1 is also used to indicate
that the symbol has not been matched by a version script pattern
(https://reviews.llvm.org/D65716).

It seems confusing to have two fields. Merge them so that we can
allocate one bit for #70130 (suppress --no-allow-shlib-undefined
error in the presence of a DSO definition).
2023-11-16 01:03:52 -08:00
Fangrui Song
e84575449f Revert "[ELF] Merge verdefIndex into versionId. NFC" #72208 (#72484)
Reverts llvm/llvm-project#72208

If a unversioned Defined preempts a versioned DSO definition, the
version ID will not be reset.
2023-11-15 23:14:07 -08:00
Jinyang He
72accbfd0a [lld][LoongArch] Support the R_LARCH_{ADD,SUB}6 relocation type (#72190)
The R_LARCH_{ADD,SUB}6 relocation type are usually used by DwarfCFA to
calculate a tiny offset. They appear after binutils 2.41, with GAS
enabling relaxation by default.
2023-11-15 09:57:45 +08:00
Fangrui Song
667ea2ca40 [ELF] Merge verdefIndex into versionId. NFC (#72208)
The two fields are similar.

`versionId` is the Verdef index in the output file. It is set for
version script patterns and `sym@ver` symbols.

`verdefIndex` is the Verdef index of a SharedSymbol. The default value
-1 is also used to indicate that the symbol has not been matched by a
version script pattern (https://reviews.llvm.org/D65716).

It seems confusing to have two fields. Merge them so that we can
allocate one bit for #70130 (suppress --no-allow-shlib-undefined
error in the presence of a DSO definition).
2023-11-14 10:20:21 -08:00
spupyrev
ef6d187115 [ELF] Fix assertion in cdsort (#71708)
It seems that some functions (.text.unlikely.xxx) may have zero size,
which
makes some builds with enabled assertions fail. Removing the assertion
and
extending one test to fix the build.
The sorting can process such zero-sized functions so no changes there
are needed
2023-11-08 12:34:36 -08:00
Fangrui Song
339f5f727a [ELF] Set file for synthesized _binary_ symbols
Ensure the property that non-null `section` implies non-null `file`.
2023-11-06 12:05:13 -08:00
spupyrev
b53c04a8da Reapply [ELF] Making cdsort default for function reordering (#68638)
Edited lld/ELF/Options.td to cdsort as well

CDSort function reordering outperforms the existing default heuristic (
hfsort/C^3) in terms of the performance of generated binaries while
being (almost) as fast. Thus, the suggestion is to change the default.
The speedup is up to 1.5% perf for large front-end binaries, and can be
moderate/neutral for "small" benchmarks.

High-level **perf impact** on two selected binaries:
clang-10 binary (built with LTO+AutoFDO/CSSPGO): wins on top of C^3 in
[0.3%..0.8%]
rocksDB-8 binary (built with LTO+CSSPGO): wins on top of C^3 in
[0.8%..1.5%]

More detailed measurements on the clang binary is at
[here](https://reviews.llvm.org/D152834#4445042)
2023-11-03 16:03:06 -07:00
Fangrui Song
b169e7fedd [ELF] Improve undefined symbol message w/ DW_TAG_variable of the enclosing symbol but w/o line number information (#70854)
The undefined symbol message suggests the source line when line number
information is available (see https://reviews.llvm.org/D31481).
When the undefined symbol is from a global variable, we won't get the
line information.
```
extern int undef;
namespace ns {
int *var[] = {
  &undef
};
// DW_TAG_variable(DW_AT_decl_file/DW_AT_decl_line) is available while
// line number information is unavailable.
}

ld.lld: error: undefined symbol: undef
>>> referenced by undef-debug2.cc
>>>               undef-debug2.o:(ns::var)
```

This patch utilizes `getEnclosingSymbol` to locate `var` and find
DW_TAG_variable for `var`:
```
ld.lld: error: undefined symbol: undef
>>> referenced by undef-debug2.cc:3 (/tmp/c/undef-debug2.cc:3)
>>>               undef-debug2.o:(ns::var)
```
2023-11-03 13:53:36 -07:00
Fangrui Song
56aa727907 [ELF] Add getEnclosingSymbol for code sharing. NFC 2023-11-03 13:43:28 -07:00
Fangrui Song
49168b2512 [ELF] Enhance --no-allow-shlib-undefined to report non-exported definition (#70769)
For a DSO with all DT_NEEDED entries accounted for, if it contains an
undefined non-weak symbol that shares a name with a non-exported
definition (hidden visibility or localized by a version script), and
there is no DSO definition, we should also report an error. Because the
definition is not exported, it cannot resolve the DSO reference at
runtime.

GNU ld introduced this error-checking in [April
2003](https://sourceware.org/pipermail/binutils/2003-April/026568.html).
The feature is available for executable links but not for -shared, and
it is
orthogonal to --no-allow-shlib-undefined. We make the feature part of
--no-allow-shlib-undefined and work with -shared when
--no-allow-shlib-undefined is specified.

A subset of this error-checking is covered by commit
1981b1b6b9 for --gc-sections discarded
sections. This patch covers non-discarded sections as well.

Internally, I have identified 2 bugs (which would fail with
LD_BIND_NOW=1) covered by commit
1981b1b6b9
2023-11-03 11:05:09 -07:00
Yaxun (Sam) Liu
3594769f20 [ELF] Define NOMINMAX to fix zlib.h caused build failure on Windows (#70368)
On Windows when zlib is enabled, zlib header introduced some Windows
headers which defines max as a macro. Since OutputSections.cpp uses
std::max with template argument, this causes compilation error.

Define macro NOMINMAX to avoid this.
2023-11-02 08:59:54 -04:00
Fangrui Song
a40f651a06 [ELF] adjustOutputSections: don't copy SHF_EXECINSTR when an output does not contain input sections (#70911)
For an output section with no input section, GNU ld eliminates the
output section when there are only symbol assignments (e.g.
`.foo : { symbol = 42; }`) but not for `.foo : { . += 42; }`
(`SHF_ALLOC|SHF_WRITE`).

We choose to retain such an output section with a symbol assignment
(unless unreferenced `PROVIDE`). We copy the previous section flag (see
https://reviews.llvm.org/D37736) to hopefully make the current PT_LOAD
segment extend to the current output section:

* decrease the number of PT_LOAD segments
* If a new PT_LOAD segment is introduced without a page-size
  alignment as a separator, there may be a run-time crash.

However, this `flags` copying behavior is not suitable for
`.foo : { . += 42; }` when `flags` contains `SHF_EXECINSTR`. The
executable bit is surprising

(https://discourse.llvm.org/t/lld-output-section-flag-assignment-behavior/74359).

I think we should drop SHF_EXECINSTR when copying `flags`. The risk is a
code section followed by `.foo : { symbol = 42; }` will be broken, which
I believe is unrelated as such uses are almost always related to data
sections.

For data-command-only output sections (e.g. `.foo : { QUAD(42) }`), we
keep allowing copyable SHF_WRITE.

Some tests are updated to drop the SHF_EXECINSTR flag. GNU ld doesn't
set SHF_EXECINSTR as well, though it sets SHF_WRITE for some tests while
we don't.
2023-11-01 22:35:28 -07:00
Fangrui Song
ec0e556e67 [ELF] Merge copyLocalSymbols and demoteLocalSymbolsInDiscardedSections (#69425)
Follow-up to #69295: In `Writer<ELFT>::run`, the symbol passes are
flexible:
they can be placed almost everywhere before `scanRelocations`, with a
constraint
that the `computeIsPreemptible` pass must be invoked for linker-defined
non-local symbols.

Merge copyLocalSymbols and demoteLocalSymbolsInDiscardedSections to
simplify
code:

* Demoting local symbols can be made unconditional, not constrainted to
/DISCARD/ uses due to performance concerns
* `includeInSymtab` can be made faster
* Make symbol passes close to each other
* Decrease data cache misses due to saving an iteration over local
symbols

There is no speedup, likely due to the unconditional `dr->section`
access in `demoteAndCopyLocalSymbols`.

`gc-sections-tls.s` no longer reports an error because the TLS symbol is
converted to an Undefined.
2023-10-18 08:56:17 -07:00
Fangrui Song
bbf7b9d805 [ELF] Remove unused setSymbolAndType after #69295. NFC
One use of setSymbolAndType (related to https://reviews.llvm.org/D53864
"Do not crash when -r output uses linker script with `/DISCARD/`")
is no longer needed after commit 1981b1b6b9
demotes symbols in discarded sections to Undefined.
2023-10-17 15:08:13 -07:00
Fangrui Song
1981b1b6b9 [ELF] Demote symbols in /DISCARD/ discarded sections to Undefined (#69295)
When an input section is matched by /DISCARD/ in a linker script, GNU ld
reports errors for relocations referencing symbols defined in the section:

    `.aaa' referenced in section `.bbb' of a.o: defined in discarded section `.aaa' of a.o

Implement the error by demoting eligible symbols to `Undefined` and changing
STB_WEAK to STB_GLOBAL. As a side benefit, in relocatable links, relocations
referencing symbols defined relative to /DISCARD/ discarded sections no longer
set symbol/type to zeros.

It's arguable whether a weak reference to a discarded symbol should lead to
errors. GNU ld reports an error and our demoting approach reports an error as
well.

Close #58891

Co-authored-by: Bevin Hansson <bevin.hansson@ericsson.com>
2023-10-17 14:10:52 -07:00
Fangrui Song
fc5d815d54 [ELF] Merge demoteSymbols and isPreemptible computation. NFC
Remove one iteration of symtab and slightly improve the performance.
2023-10-17 13:52:08 -07:00
Fangrui Song
e9b9a1d320 [ELF] Move demoteSymbols to Writer.cpp. NFC
History of demoteSharedSymbols:

* https://reviews.llvm.org/D45536 demotes SharedSymbol
* https://reviews.llvm.org/D111365 demotes lazy symbols
* The pending #69295 will demote symbols defined in discarded sections

The pass is placed after markLive just to be clear that it needs `isNeeded`
information computed by markLive. The remaining passes in Driver.cpp do not use
symbol information. Move the pass to Writer.cpp to be closer to other
symbol-related passes.
2023-10-17 13:16:50 -07:00
Fangrui Song
60b3e05967 [ELF] Restore the --call-graph-profile-sort=hfsort default before #68638
The high time complexity of cache-directed sort is a real issue and is not
appropriate as the default, at least for now
(https://github.com/llvm/llvm-project/pull/68638#issuecomment-1760918891).
2023-10-12 22:58:42 -07:00
Kazu Hirata
4a0ccfa865 Use llvm::endianness::{big,little,native} (NFC)
Note that llvm::support::endianness has been renamed to
llvm::endianness while becoming an enum class as opposed to an
enum. This patch replaces support::{big,little,native} with
llvm::endianness::{big,little,native}.
2023-10-12 21:21:45 -07:00
Kazu Hirata
b8885926f8 Use llvm::endianness::{big,little,native} (NFC)
Note that llvm::support::endianness has been renamed to
llvm::endianness while becoming an enum class as opposed to an enum.
This patch replaces llvm::support::{big,little,native} with
llvm::endianness::{big,little,native}.
2023-10-10 22:54:51 -07:00
spupyrev
d5c1d735ad [ELF] Making cdsort default for function reordering (#68638)
CDSort function reordering outperforms the existing default heuristic (
hfsort/C^3) in terms of the performance of generated binaries while
being (almost) as fast. Thus, the suggestion is to change the default.
The speedup is up to 1.5% perf for large front-end binaries, and can be
moderate/neutral for "small" benchmarks.

High-level **perf impact** on two selected binaries:
clang-10 binary (built with LTO+AutoFDO/CSSPGO): wins on top of C^3 in
[0.3%..0.8%]
rocksDB-8 binary (built with LTO+CSSPGO): wins on top of C^3 in
[0.8%..1.5%]

More detailed measurements on the clang binary is at
[here](https://reviews.llvm.org/D152834#4445042)
2023-10-10 09:06:31 -07:00
Mitch Phillips
144d127bef [lld] [MTE] Drop MTE globals for fully static executables, not ban (#68217)
Integrating MTE globals on Android revealed a lot of cases where
libraries are built as both archives and DSOs, and they're linked into
fully static and dynamic executables respectively.

MTE globals doesn't work for fully static executables. They need a
dynamic loader to process the special R_AARCH64_RELATIVE relocation
semantics with the encoded offset. Fully static executables that had
out-of-bounds derived symbols (like 'int* foo_end = foo[16]') crash
under MTE globals w/ static executables. So, LLD in its current form
simply errors out when you try and compile a fully static executable
that has a single MTE global variable in it.

It seems like a much better idea to simply have LLD not do the special
work for MTE globals in fully static contexts, and to drop any
unnecessary metadata. This means that you can build archives with MTE
globals and link them into both fully-static and dynamic executables.
2023-10-10 17:32:10 +02:00
Kazu Hirata
d7b18d5083 Use llvm::endianness{,::little,::native} (NFC)
Now that llvm::support::endianness has been renamed to
llvm::endianness, we can use the shorter form.  This patch replaces
llvm::support::endianness with llvm::endianness.
2023-10-09 00:54:47 -07:00
Ben Shi
488a62f86e [lld][ELF][AVR] Add range check for R_AVR_13_PCREL (#67636)
Some large AVR programs (for devices without long jump) may exceed
128KiB, and lld should give explicit errors other than generate wrong
executables silently.
2023-10-06 18:20:03 +08:00
Arthur Eubanks
9d6ec280fc [lld/ELF] Don't relax R_X86_64_(REX_)GOTPCRELX when offset is too far
For each R_X86_64_(REX_)GOTPCRELX relocation, check that the offset to the symbol is representable with 2^32 signed offset. If not, add a GOT entry for it and set its expr to R_GOT_PC so that we emit the GOT load instead of the relaxed lea. Do this in finalizeAddressDependentContent() where we iteratively attempt this (e.g. RISCV uses this for relaxation, ARM uses this to insert thunks).

Decided not to do the opposite of inserting GOT entries initially and removing them when relaxable because removing GOT entries isn't simple.

One drawback of this approach is that if we see any GOTPCRELX relocation, we'll create an empty .got even if it's not required in the end.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D157020
2023-10-04 13:03:56 -07:00
Alexandre Ganea
a2ef046a2d [LLD][ELF] Import ObjFile::importCmseSymbols at call site (#68025)
Before this patch, with MSVC I was seeing:
```
[304/334] Building CXX object tools\lld\ELF\CMakeFiles\lldELF.dir\InputFiles.cpp.obj
C:\git\llvm-project\lld\ELF\InputFiles.h(327): warning C4661: 'void lld::elf::ObjFile<llvm::object::ELF32LE>::importCmseSymbols(void)': no suitable definition provided for explicit template instantiation request
C:\git\llvm-project\lld\ELF\InputFiles.h(291): note: see declaration of 'lld::elf::ObjFile<llvm::object::ELF32LE>::importCmseSymbols'
C:\git\llvm-project\lld\ELF\InputFiles.h(327): warning C4661: 'void lld::elf::ObjFile<llvm::object::ELF32LE>::redirectCmseSymbols(void)': no suitable definition provided for explicit template instantiation request
C:\git\llvm-project\lld\ELF\InputFiles.h(292): note: see declaration of 'lld::elf::ObjFile<llvm::object::ELF32LE>::redirectCmseSymbols'
```
This patch removes `redirectCmseSymbols` which is not defined. And it
imports `importCmseSymbols` in InputFiles.cpp, because it is already
explicitly instantiated in ARM.cpp.
2023-10-03 10:12:09 -04:00
simpal01
3cde1d8000 [ELF] Handle relocations in synthetic .eh_frame with a non-zero offset within the output section (#65966)
When the .eh_frame section is placed at a non-zero
offset within its output section, the relocation
value within .eh_frame are computed incorrectly.

We had similar issue in .ARM.exidx section and it has been
fixed already in https://reviews.llvm.org/D148033.

While applying the relocation using S+A-P,  the value
 of P (the location of the relocation) is getting wrong. 
P is:
  P = SecAddr + rel.offset, But SecAddr points to the
starting address of the outputsection rather than the
starting address of the eh frame section within that
output section.

This issue affects all targets which generates .eh_frame 
section. Hence fixing in all the corresponding targets it affecting.
2023-10-03 10:20:14 +01:00
Matheus Izvekov
8ff77a8f04 [NFC][LLD] Refactor some copy-paste into the Common library (#67598) 2023-09-28 00:06:48 +02:00
spupyrev
904b3f66f5 [ELF] A new code layout algorithm for function reordering [3a/3]
We are brining a new algorithm for function layout (reordering) based on the
call graph (extracted from a profile data). The algorithm is an improvement of
top of a known heuristic, C^3. It tries to co-locate hot and frequently executed
together functions in the resulting ordering. Unlike C^3, it explores a larger
search space and have an objective closely tied to the performance of
instruction and i-TLB caches. Hence, the name CDS = Cache-Directed Sort.
The algorithm can be used at the linking or post-linking (e.g., BOLT) stage.
Refer to https://reviews.llvm.org/D152834 for the actual implementation of the
reordering algorithm.

This diff adds a linker option to replace the existing C^3 heuristic with CDS.
The new behavior can be turned on by passing "--use-cache-directed-sort".
(the plan is to make it default in a next diff)

**Perf-impact**
clang-10 binary (built with LTO+AutoFDO/CSSPGO): wins on top of C^3 in [0.3%..0.8%]
rocksDB-8 binary (built with LTO+CSSPGO): wins on top of C^3 in [0.8%..1.5%]

Note that function layout affects the perf the most on older machines (with
smaller instruction/iTLB caches) and when huge pages are not enabled. The impact
on newer processors with huge pages enabled is likely neutral/minor.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D152840
2023-09-26 06:24:34 -07:00
Fangrui Song
8c556b7e2b [ELF] Change --call-graph-profile-sort to accept an argument
Change the FF form --call-graph-profile-sort to --call-graph-profile-sort={none,hfsort}.
This will be extended to support llvm/lib/Transforms/Utils/CodeLayout.cpp.

--call-graph-profile-sort is not used in the wild but
--no-call-graph-profile-sort is (Chromium). Make --no-call-graph-profile-sort an
alias for --call-graph-profile-sort=none.

Reviewed By: rahmanl

Differential Revision: https://reviews.llvm.org/D159544
2023-09-25 09:49:40 -07:00
Fangrui Song
f5b42eaadb [ELF] -r --compress-debug-sections: update implicit addends for .rel.debug_* referencing STT_SECTION symbols (#66804)
https://reviews.llvm.org/D48929 updated addends for non-SHF_ALLOC sections
relocated by REL for -r links, but the patch did not update the addends when
--compress-debug-sections={zlib,zstd} is used (#66738).

https://reviews.llvm.org/D116946 handled tombstone values in debug
sections in relocatable links. As a side effect, both
relocateNonAllocForRelocatable (using `sec->relocations`) and
relocatenonNonAlloc (using raw REL/RELA) may run.

Actually, we can adjust the condition in relocatenonAlloc to completely replace
relocateNonAllocForRelocatable. This patch implements this idea and fixes #66738.

As relocateNonAlloc processes the raw relocations like copyRelocations() does,
the condition `if (config->relocatable && type != target.noneRel)` in `copyRelocations`
(commit 08d6a3f133, modified by https://reviews.llvm.org/D62052)
can be made specific to SHF_ALLOC sections.

As a side effect, we can now report diagnostics for PC-relative relocations for
-r. This is a less useful diagnostic that is not worth too much code. As
https://github.com/ClangBuiltLinux/linux/issues/1937 has violations, just
suppress the warning for -r. Tested by commit 561b98f9e0.
2023-09-20 14:50:13 -07:00
Fangrui Song
0de0b6dded [ELF] Postpone "unable to move location counter backward" error (#66854)
The size of .ARM.exidx may shrink across `assignAddress` calls. It is
possible
that the initial iteration has a larger location counter, causing
`__code_size =
__code_end - .; osec : { . += __code_size; }` to report an error, while
the error would
have been suppressed for subsequent `assignAddress` iterations.

Other sections like .relr.dyn may change sizes across `assignAddress`
calls as
well. However, their initial size is zero, so it is difficiult to
trigger a
similar error.

Similar to https://reviews.llvm.org/D152170, postpone the error
reporting.
Fix #66836. While here, add more information to the error message.
2023-09-20 09:06:45 -07:00
Fangrui Song
678c1f142c [ELF] Remove a R_ARM_PCA special case from relocateNonAlloc
https://reviews.llvm.org/D75042 added a special case about R_ARM_PCA to
relocateNonAlloc. This is untested and actually unused in the wild.
2023-09-19 21:04:50 -07:00
modimo
272bd6f9cc [WPD][LLD] Add option to validate RTTI is enabled on all native types and prevent devirtualization on types with native RTTI
Discussion about this approach: https://discourse.llvm.org/t/rfc-safer-whole-program-class-hierarchy-analysis/65144/18

When enabling WPD in an environment where native binaries are present, types we want to optimize can be derived from inside these native files and devirtualizing them can lead to correctness issues. RTTI can be used as a way to determine all such types in native files and exclude them from WPD providing a safe checked way to enable WPD.

The approach is:
1. In the linker, identify if RTTI is available for all native types. If not, under `--lto-validate-all-vtables-have-type-infos` `--lto-whole-program-visibility` is automatically disabled. This is done by examining all .symtab symbols in object files and .dynsym symbols in DSOs for vtable (_ZTV) and typeinfo (_ZTI) symbols and ensuring there's always a match for every vtable symbol.
2. During thinlink, if `--lto-validate-all-vtables-have-type-infos` is set and RTTI is available for all native types, identify all typename (_ZTS) symbols via their corresponding typeinfo (_ZTI) symbols that are used natively or outside of our summary and exclude them from WPD.

Testing:
ninja check-all
large Meta service that uses boost, glog and libstdc++.so runs successfully with WPD via --lto-whole-program-visibility. Previously, native types in boost caused incorrect devirtualization that led to crashes.

Reviewed By: MaskRay, tejohnson

Differential Revision: https://reviews.llvm.org/D155659
2023-09-18 15:51:49 -07:00