Commit Graph

552 Commits

Author SHA1 Message Date
Fangrui Song
c03fdd3403 [ELF] Fix the branch range computation when reusing a thunk
Notation: dst is `t->getThunkTargetSym()->getVA()`

On AArch64, when `src-0x8000000-r_addend <= dst < src-0x8000000`, the condition
`target->inBranchRange(rel.type, src, rel.sym->getVA(rel.addend))` may
incorrectly consider a thunk reusable.
`rel.addend = -getPCBias(rel.type)` resets the addend to 0 for AArch64/PPC
and the zero addend is used by `rel.sym->getVA(rel.addend)` to check
out-of-range relocations.

See the test for a case this computation is wrong:
`error: a.o:(.text_high+0x4): relocation R_AARCH64_JUMP26 out of range: -134217732 is not in [-134217728, 134217727]`
I have seen a real world case with r_addend=19960.

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D117734
2022-01-24 09:03:21 -08:00
Fangrui Song
54fe70bfba [ELF] RelocationScanner::scanOne: replace rel.r_offset with offset. NFC 2022-01-17 00:05:27 -08:00
Fangrui Song
4c36567179 [ELF] Relocations: remove some cast<Undefined>. NFC 2022-01-17 00:02:47 -08:00
Fangrui Song
b8d4eb84d7 [ELF] De-template getAlternativeSpelling. NFC 2022-01-16 23:56:25 -08:00
Fangrui Song
37a1291885 [ELF] Add RelocationScanner. NFC
Currently the way some relocation-related static functions pass around
states is clumsy. Add a Resolver class to store some states as member
variables.

Advantages:

* Avoid the parameter `InputSectionBase &sec` (this offsets the cost passing around `this` paramemter)
* Avoid the parameter `end` (Mips and PowerPC hacks)
* `config` and `target` can be cached as member variables to reduce global state accesses. (potential speedup because the compiler didn't know `config`/`target` were not changed across function calls)
* If we ever want to reduce if-else costs (e.g. `config->emachine==EM_MIPS` for non-Mips) or introduce parallel relocation scan not handling some tricky arches (PPC/Mips), we can templatize Resolver

`target` isn't used as much as `config`, so I change it to a const reference
during the migration.

There is a minor performance inprovement for elf::scanRelocations.

Reviewed By: ikudrin, peter.smith

Differential Revision: https://reviews.llvm.org/D116881
2022-01-11 09:54:53 -08:00
Fangrui Song
5dbbd4eeb8 [ELF] Move OffsetGetter before some static functions. NFC
to prepare for D116881.
2022-01-10 20:16:02 -08:00
Fangrui Song
7f1955dc96 [ELF] Support mixed TLSDESC and TLS GD
We only support both TLSDESC and TLS GD for x86 so this is an x86-specific
problem. If both are used, only one R_X86_64_TLSDESC is produced and TLS GD
accesses will incorrectly reference R_X86_64_TLSDESC. Fix this by introducing
SymbolAux::tlsDescIdx.

Reviewed By: ikudrin

Differential Revision: https://reviews.llvm.org/D116900
2022-01-10 10:03:21 -08:00
Fangrui Song
5d3bd7f360 [ELF] Move gotIndex/pltIndex/globalDynIndex to SymbolAux
to decrease sizeof(SymbolUnion) by 8 on ELF64 platforms.

Symbols needing such information are typically 1% or fewer (5134 out of 560520
when linking clang, 19898 out of 5550705 when linking chrome). Storing them
elsewhere can decrease memory usage and symbol initialization time.
There is a ~0.8% saving on max RSS when linking a large program.

Future direction:

* Move some of dynsymIndex/verdefIndex/versionId to SymbolAux
* Support mixed TLSDESC and TLS GD without increasing sizeof(SymbolUnion)

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D116281
2022-01-09 13:43:27 -08:00
Fangrui Song
ddea3bf7d1 [ELF] Remove redundant cast. NFC 2022-01-05 02:07:15 -08:00
Fangrui Song
cb203f3f92 [ELF] Change InStruct/Partition pointers to unique_ptr
and remove associated make<XXX> calls.
gnuHash and sysvHash are unchanged, otherwise LinkerScript::discard would
destroy the objects which may be referenced by input section descriptions.

My x86-64 lld executable is 121+KiB smaller.
2021-12-27 18:15:23 -08:00
Fangrui Song
7924b3814f [ELF] Add Symbol::hasVersionSuffix
"Process symbol versions" may take 2+% time.
"Redirect symbols" may take 0.6% time.
This change speeds up the two passes and makes `*sym.getVersionSuffix()
== '@'` in the `undefined reference` diagnostic cleaner.

Linking chrome (no debug info) and another large program is 1.5% faster.

For empty-ver2.s: the behavior now matches GNU ld, though I'd consider the input
invalid and the exact behavior does not matter.
2021-12-26 17:25:54 -08:00
Fangrui Song
10316a6f94 [ELF] Change InputSectionDescription members from vector to SmallVector
This decreases sizeof(lld::elf::InputSectionDescription) from 264 to 232.
2021-12-26 13:06:54 -08:00
Fangrui Song
20b4704da3 [ELF] reportRangeError: mention symbol name for non-STT_SECTION local symbols like non-global symbols 2021-12-25 23:46:47 -08:00
Fangrui Song
a00f480fe8 [ELF] scanReloc: remove unused start parameter. NFC
This was once used as a workaround for detecting missing PPC64 TLSGD/TLSLD
relocations produced by ancient IBM XL C/C++.
2021-12-25 14:34:06 -08:00
Fangrui Song
dd4f5d4ae5 [ELF] De-template handleTlsRelocation. NFC 2021-12-25 14:23:13 -08:00
Fangrui Song
70912420bb [ELF] Move TLS dynamic relocations to postScanRelocations
This temporarily increases sizeof(SymbolUnion), but allows us to mov GOT/PLT/etc
index members outside Symbol in the future.

Then, we can make TLSDESC and TLSGD use different indexes and support mixed
TLSDESC and TLSGD (tested by x86-64-tlsdesc-gd-mixed.s).

Note: needsTlsGd and needsTlsGdToIe may optionally be combined.

Test updates are due to reordered GOT entries.
2021-12-24 22:36:49 -08:00
Fangrui Song
b5a0f0f397 [ELF] Add ELFFileBase::{elfShdrs,numELFShdrs} to avoid duplicate llvm::object::ELFFile::sections()
This mainly avoid `relsOrRelas` cost in `InputSectionBase::relocate`.
`llvm::object::ELFFile::sections()` has redundant and expensive checks.
2021-12-24 17:10:38 -08:00
Fangrui Song
e1b6b5be46 [ELF] Avoid referencing SectionBase::repl after ICF
It is fairly easy to forget SectionBase::repl after ICF.
Let ICF rewrite a Defined symbol's `section` field to avoid references to
SectionBase::repl in subsequent passes. This slightly improves the --icf=none
performance due to less indirection (maybe for --icf={safe,all} as well if most
symbols are Defined).

With this change, there is only one reference to `repl` (--gdb-index D89751).
We can undo f4fb5fd752 (`Move Repl to SectionBase.`)
but move `repl` to `InputSection` instead.

Reviewed By: ikudrin

Differential Revision: https://reviews.llvm.org/D116093
2021-12-24 12:09:48 -08:00
Fangrui Song
ad26b0b233 Revert "[ELF] Make Partition/InStruct members unique_ptr and remove associate make<XXX>"
This reverts commit e48b1c8a27.
This reverts commit d019de23a1.

The changes caused memory leaks (non-final classes cannot use unique_ptr).
2021-12-22 23:55:11 -08:00
Fangrui Song
d019de23a1 [ELF] Make InStruct members unique_ptr and remove associate make<XXX>
See D116143 for benefits. My lld executable (x86-64) is 24+KiB smaller.
2021-12-22 21:11:26 -08:00
Fangrui Song
5c75cc51b3 [ELF] Change nonnull pointer parameters to references. NFC 2021-12-22 21:09:57 -08:00
Fangrui Song
baa3eb0dd9 [ELF] Change some non-null pointer parameters to references. NFC 2021-12-22 20:51:11 -08:00
Fangrui Song
8617996ac1 [ELF] maybeReportUndefined: move sym.isUndefined() check to the caller. NFC
Avoid a function call in the majority of cases.
2021-12-16 00:27:19 -08:00
Fangrui Song
a8d6d2614b [ELF] Replace make<Defined> with makeDefined. NFC
This removes SpecificAlloc<Defined> and makes my lld executable 1.5k smaller.
This drops the small memory waste due to the separate BumpPtrAllocator.
2021-12-15 13:15:03 -08:00
Fangrui Song
7a54ae9c1d [ELF] Change objectFiles to ELFFileBase *
This can sometimes avoid `cast<ObjFile<...>>`.

I intentionally do not touch postScanRelocations to wait for its stabilization.
2021-12-15 00:37:10 -08:00
Fangrui Song
cf783be8d7 Reland D114783/D115603 [ELF] Split scanRelocations into scanRelocations/postScanRelocations
(Fixed an issue about GOT on a copy relocated alias.)
(Fixed an issue about not creating r_addend=0 IRELATIVE for unreferenced non-preemptible ifunc.)

The idea is to make scanRelocations mark some actions are needed (GOT/PLT/etc)
and postpone the real work to postScanRelocations. It gives some flexibility:

* Make it feasible to support .plt.got (PR32938): we need to know whether GLOB_DAT and JUMP_SLOT are both needed.
* Make non-preemptible IFUNC handling slightly cleaner: avoid setting/clearing sym.gotInIgot
* -z nocopyrel: report all copy relocation places for one symbol
* Make GOT deduplication feasible
* Make parallel relocation scanning feasible (if we can avoid all stateful operations and make Symbol attributes atomic), but parallelism may not be the appealing choice

Since this patch moves a large chunk of code out of ELFT templates. My x86-64
executable is actually a few hundred bytes smaller.

For ppc32-ifunc-nonpreemptible-pic.s: I remove absolute relocation references to non-preemptible ifunc
because absolute relocation references are incorrect in -fpie mode.

Reviewed By: peter.smith, ikudrin

Differential Revision: https://reviews.llvm.org/D114783
2021-12-14 16:28:41 -08:00
Fangrui Song
ea15b862d7 Revert D114783 [ELF] Split scanRelocations into scanRelocations/postScanRelocations
May cause a failure for non-preemptible `bcmp` in a glibc -static link.
2021-12-14 14:33:50 -08:00
Fangrui Song
b79686c6dc [ELF] Remove needsPltAddr in favor of needsCopy
needsPltAddr is equivalent to `needsCopy && isFunc`. In many places, it is
equivalent to `needsCopy` because the non-STT_FUNC cases are ruled out.

Reviewed By: ikudrin, peter.smith

Differential Revision: https://reviews.llvm.org/D115603
2021-12-14 09:52:43 -08:00
Fangrui Song
e7a95b0674 Reland [ELF] Split scanRelocations into scanRelocations/postScanRelocations
(Fixed an issue about GOT on a copy relocated alias.)

The idea is to make scanRelocations mark some actions are needed (GOT/PLT/etc)
and postpone the real work to postScanRelocations. It gives some flexibility:

* Make it feasible to support .plt.got (PR32938): we need to know whether GLOB_DAT and JUMP_SLOT are both needed.
* Make non-preemptible IFUNC handling slightly cleaner: avoid setting/clearing sym.gotInIgot
* -z nocopyrel: report all copy relocation places for one symbol
* Make GOT deduplication feasible
* Make parallel relocation scanning feasible (if we can avoid all stateful operations and make Symbol attributes atomic), but parallelism may not be the appealing choice

Since this patch moves a large chunk of code out of ELFT templates. My x86-64
executable is actually a few hundred bytes smaller.

For ppc32-ifunc-nonpreemptible-pic.s: I remove absolute relocation references to non-preemptible ifunc
because absolute relocation references are incorrect in -fpie mode.

Reviewed By: peter.smith, ikudrin

Differential Revision: https://reviews.llvm.org/D114783
2021-12-13 20:11:24 -08:00
Fangrui Song
0b8b86e30f Revert "[ELF] Split scanRelocations into scanRelocations/postScanRelocations"
This reverts commit fc33861d48.

`replaceWithDefined` should copy needsGot, otherwise an alias for a copy
relocated symbol may not have GOT entry if its needsGot was originally true.
2021-12-13 19:29:53 -08:00
Fangrui Song
fc33861d48 [ELF] Split scanRelocations into scanRelocations/postScanRelocations
The idea is to make scanRelocations mark some actions are needed (GOT/PLT/etc)
and postpone the real work to postScanRelocations. It gives some flexibility:

* Make it feasible to support .plt.got (PR32938): we need to know whether GLOB_DAT and JUMP_SLOT are both needed.
* Make non-preemptible IFUNC handling slightly cleaner: avoid setting/clearing sym.gotInIgot
* -z nocopyrel: report all copy relocation places for one symbol
* Make parallel relocation scanning possible (if we can avoid all stateful operations and make Symbol attributes atomic), but parallelism may not be the appealing choice
* Make GOT deduplication feasible

Since this patch moves a large chunk of code out of ELFT templates. My x86-64
executable is actually a few hundred bytes smaller.

For ppc32-ifunc-nonpreemptible-pic.s: I remove absolute relocation references to non-preemptible ifunc
because absolute relocation references are incorrect in -fpie mode.

Reviewed By: peter.smith, ikudrin

Differential Revision: https://reviews.llvm.org/D114783
2021-12-13 09:56:52 -08:00
George Koehler
885fb9a257 [ELF][PPC32] Make R_PPC32_PLTREL retain .got
PLT usage needs the first 12 bytes of the .got section. We need to keep .got and
DT_GOT_PPC even if .got/_GLOBAL_OFFSET_TABLE_ are not referenced (large PIC code
may only reference .got2), which is the case in OpenBSD's ld.so, leading
to a misleading error, "unsupported insecure BSS PLT object".

Fix this by adding R_PPC32_PLTREL to the list of hasGotOffRel.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D114982
2021-12-02 15:28:37 -08:00
Fangrui Song
353fe72ca3 [ELF] Hint -z nostart-stop-gc for __start_ undefined references
Make users aware what to do with ld.lld 13.0.0 / GNU ld<2015-10 --gc-sections
behavior.

Differential Revision: https://reviews.llvm.org/D114830
2021-12-02 11:58:25 -08:00
Fangrui Song
5047e3a3ba [ELF] Move GOT/PLT relocation code closer. NFC 2021-11-29 23:10:04 -08:00
Fangrui Song
7051aeef7a [ELF] Rename BaseCommand to SectionCommand. NFC
BaseCommand was picked when PHDRS/INSERT/etc were not implemented. Rename it to
SectionCommand to match `sectionCommands` and make it clear that the commands
are used in SECTIONS (except a special case for SymbolAssignment).

Also, improve naming of some BaseCommand variables (base -> cmd).
2021-11-25 20:24:23 -08:00
Fangrui Song
6188fd4957 [ELF] Rename OutputSection::sectionCommands to commands. NFC
This partially reverts r315409: the description applies to LinkerScript, but not
to OutputSection.

The name "sectionCommands" is used in both LinkerScript::sectionCommands and
OutputSection::sectionCommands, which may lead to confusion.
"commands" in OutputSection has no ambiguity because there are no other types
of commands.
2021-11-25 16:47:07 -08:00
Fangrui Song
d71bb6a409 [ELF] Inline isPPC64SmallCodeModelTocReloc which is only called once. NFC 2021-11-09 20:41:05 -08:00
Fangrui Song
bec28ee1ea [ELF] Move isStaticLinkTimeConstant closer to the only caller processRelocAux. NFC 2021-11-09 20:37:46 -08:00
Fangrui Song
2f7366c89d [ELF] Simplify R_DTPREL. NFC 2021-10-31 20:30:00 -07:00
Fangrui Song
9f8ffaaa0b [ELF] Replace "symbol '...' has no type" diagnostic with "relocation ... cannot be used against symbol '...'"
The "symbol 'foo' has no type" diagnostic tries to inform that copy
relocation/canonical PLT entry cannot be used, but the diagnostic is often
incorrect and confusing.
2021-10-31 13:12:26 -07:00
Fangrui Song
164194a5af [ELF] Untangle R_GOT style TLS IE and processRelocAux. NFC 2021-10-31 12:38:36 -07:00
Fangrui Song
55e69ece72 [ELF] Remove -Wl,-z,notext hint
The hint does not pull its weight:

* adding -Wl,-z,notext often won't work (relocation types other than `symbolRel`, e.g. `R_AARCH64_LDST32_ABS_LO12_NC`)
* for pure (no assembly) C/C++ projects, the "-fPIC" hint is sufficient
2021-10-31 12:10:43 -07:00
Fangrui Song
b76aacef5f [ELF] Simplify isStaticLinkTimeConstant. NFC 2021-10-31 10:46:42 -07:00
Fangrui Song
e39c138f45 [ELF] Implement TLSDESC for x86-32
`-z rela` is also supported.

Tested with:

```
cat > ./a.c <<eof
#include <assert.h>
int foo();
int bar();
int main() {
  assert(foo() == 2);
  assert(foo() == 4);
  assert(bar() == 2);
  assert(bar() == 4);
}
eof

cat > ./b.c <<eof
#include <stdio.h>
__thread int tls0;
extern __thread int tls1;
int foo() { return ++tls0 + ++tls1; }
static __thread int tls2, tls3;
int bar() { return ++tls2 + ++tls3; }
eof

echo '__thread int tls1;' > ./c.c

sed 's/        /\t/' > ./Makefile <<'eof'
.MAKE.MODE = meta curDirOk=true

CC := gcc -m32 -g -fpic -mtls-dialect=gnu2
LDFLAGS := -m32 -Wl,-rpath=.

all: a0 a1 a2

run: all
        ./a0 && ./a1 && ./a2

c.so: c.o; ${LINK.c} -shared $> -o $@
bc.so: b.o c.o; ${LINK.c} -shared $> -o $@
b.so: b.o c.so; ${LINK.c} -shared $> -o $@

a0: a.o b.o c.o; ${LINK.c} $> -o $@
a1: a.o b.so; ${LINK.c} $> -o $@
a2: a.o bc.so; ${LINK.c} $> -o $@
eof
```
and glibc `elf/tst-gnu2-tls1`.

`/usr/local/bin/ld` points to the freshly built `lld`.

`bmake run && bmake CFLAGS=-O1 run` => ok.

Differential Revision: https://reviews.llvm.org/D112582
2021-10-28 17:52:03 -07:00
Fangrui Song
2b1e32410c [ELF] Change common diagnostics to report both object file location and source file location
Many diagnostics use `getErrorPlace` or `getErrorLocation` to report a location.
In the presence of line table debug information, `getErrorPlace` uses a source
file location and ignores the object file location. However, the object file
location is sometimes more useful.

This patch changes "undefined symbol" and "out of range" diagnostics to report
both object/source file locations. Other diagnostics can use similar format if
needed.

The key idea is to let `InputSectionBase::getLocation` report the object file
location and use `getSrcMsg` for source file/line information. `getSrcMsg`
doesn't leverage `STT_FILE` information yet, but I think the temporary lack of
the functionality is ok.

For the ARM "branch and link relocation" diagnostic, I arbitrarily place the
source file location at the end of the line. The diagnostic is not very common
so its formatting doesn't need to be pretty.

Differential Revision: https://reviews.llvm.org/D112518
2021-10-28 09:38:45 -07:00
Fangrui Song
ecc93ed2d7 [ELF] Replace InputBaseSection::{areRelocsRela,firstRelocation,numRelocation} with relSecIdx
For `InputSection` `.foo`, its `InputBaseSection::{areRelocsRela,firstRelocation,numRelocation}` basically
encode the information of `.rel[a].foo`. However, one uint32_t (the relocation section index)
suffices. See the implementation of `relsOrRelas`.

This change decreases sizeof(InputSection) from 184 to 176 on 64-bit Linux.

The maximum resident set size linking a large application (1.2G output) decreases by 0.39%.

Differential Revision: https://reviews.llvm.org/D112513
2021-10-27 09:51:07 -07:00
Fangrui Song
ca8105b76c [ELF][X86] Support R_X86_64_PLTOFF64
For a function call (using the default `-fplt`), GCC `-mcmodel=large` generates an assembly modifier which
leads to an R_X86_64_PLTOFF64 relocation. In real world,
http://git.ageinghacker.net/jitter (used by GNU poke) uses `-mcmodel=large`.

R_X86_64_PLTOFF64's formula is (if preemptible) `L - GOT + A` or (if non-preemptible) `S - GOT + A`
where `GOT` is (confusingly) the address of `.got.plt`

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D112386
2021-10-25 13:05:17 -07:00
Fangrui Song
a14ccaf509 [ELF] Support 128-bit bitmask in oneof(RelExpr)
Taken from Chih-Mao Chen's D100835.

RelExpr has 64 bits now and needs the extension to support new members
(`R_PLT_GOTPLT` for `R_X86_64_PLTOFF64` support).

Note: RelExpr needs to have at least a member >=64 to prevent
-Wtautological-constant-out-of-range-compare for `if (expr >= 64)`.

Reviewed By: arichardson, peter.smith

Differential Revision: https://reviews.llvm.org/D112385
2021-10-25 13:05:17 -07:00
Fangrui Song
3726039561 [ELF] Simplify addGotEntry. NFC 2021-08-29 13:40:08 -07:00
Fangrui Song
d3fdc312b2 [ELF] Untangle TLS IE and regular GOT from addGotEntry for non-mips. NFC 2021-08-29 13:21:06 -07:00