Commit Graph

1745 Commits

Author SHA1 Message Date
Fangrui Song
698ac4aba5 [ELF] Add PT_RISCV_ATTRIBUTES program header
Close https://github.com/llvm/llvm-project/issues/63084

Unlike AArch32, RISC-V defines PT_RISCV_ATTRIBUTES to include the
SHT_RISCV_ATTRIBUTES section. There is no real-world use case yet.

We place PT_RISCV_ATTRIBUTES after PT_GNU_STACK, similar to PT_ARM_EXIDX. GNU ld
places PT_RISCV_ATTRIBUTES earlier, but the placement should not matter.

Link: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/71

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D152065
2023-06-06 13:06:21 -07:00
Fangrui Song
8d85c96e0e [lld] StringRef::{starts,ends}with => {starts,ends}_with. NFC
The latter form is now preferred to be similar to C++20 starts_with.
This replacement also removes one function call when startswith is not inlined.
2023-06-05 14:36:19 -07:00
Fangrui Song
8aea109504 [ELF] x86-64: place .lrodata, .lbss, and .ldata away from code sections
The x86-64 medium code model utilizes large data sections, namely .lrodata,
.lbss, and .ldata (along with some variants of .ldata). There is a proposal to
extend the use of large data sections to the large code model as well[1].

This patch aims to place large data sections away from code sections in order to
alleviate relocation overflow pressure caused by code sections referencing
regular data sections.

```
.lrodata
.rodata
.text     # if --ro-segment, MAXPAGESIZE alignment
RELRO     # MAXPAGESIZE alignment
.data     # MAXPAGESIZE alignment
.bss
.ldata    # MAXPAGESIZE alignment
.lbss
```

In comparison to GNU ld, which places .lbss, .lrodata, and .ldata after .bss, we
place .lrodata above .rodata to minimize the number of permission transitions in
the memory image.

While GNU ld places .lbss after .bss, the subsequent sections don't reuse the
file offset bytes of BSS.

Our approach is to place .ldata and .lbss after .bss and create a PT_LOAD
segment for .bss to large data section transition in the absence of SECTIONS
commands. assignFileOffsets ensures we insert an alignment instead of allocating
space for BSS, and therefore we don't waste more than MAXPAGESIZE bytes. We have
a missing optimization to prevent all waste, but implementing it would introduce
complexity and likely be error-prone.

GNU ld's layout introduces 2 more MAXPAGESIZE alignments while ours
introduces just one.

[1]: https://groups.google.com/g/x86-64-abi/c/jnQdJeabxiU "Large data sections for the large code model"

With help from Arthur Eubanks.

Co-authored-by: James Y Knight <jyknight@google.com>

Reviewed By: aeubanks, tkoeppe

Differential Revision: https://reviews.llvm.org/D150510
2023-05-25 07:35:38 -07:00
Petr Hosek
811cbfc262 [lld][ELF] Implement –print-memory-usage
This option was introduced in GNU ld in
https://sourceware.org/legacy-ml/binutils/2015-06/msg00086.html and is
often used in embedded development. This change implements this option
in LLD matching the GNU ld output verbatim.

Differential Revision: https://reviews.llvm.org/D150644
2023-05-25 07:14:18 +00:00
Fangrui Song
2473b1af08 [ELF] Simplify getSectionRank and rewrite comments
Replace some RF_ flags with integer literals.
Rewrite the isWrite/isExec block to make the code block order reflect
the section order.
Rewrite some imprecise comments.

This is NFC, if we don't count invalid cases such as non-writable TLS
and non-writable RELRO.
2023-05-12 23:58:39 -07:00
Fangrui Song
a2648bc4ea [ELF] Remove remnant ranks for PPC64 ELFv1 special sections 2023-05-12 23:21:14 -07:00
Peter Smith
7a2000ac53 [LLD][ARM] Handle .ARM.exidx sections at non-zero output sec offset
Embedded systems that do not use an ELF loader locate the
.ARM.exidx exception table via linker defined __exidx_start and
__exidx_end rather than use the PT_ARM_EXIDX program header. This
means that some linker scripts such as the picolibc C library's
linker script, do not have the .ARM.exidx sections at offset 0 in
the OutputSection. For example:

.except_unordered : {
    . = ALIGN(8);
    PROVIDE(__exidx_start = .);
    *(.ARM.exidx*)
    PROVIDE(__exidx_end = .);
} >flash AT>flash :text

This is within the specification of Arm exception tables, and is
handled correctly by ld.bfd.

This patch has 2 parts. The first updates the writing of the data
of the .ARM.exidx SyntheticSection to account for a non-zero
OutputSection offset. The second part makes the PT_ARM_EXIDX program
header generation a special case so that it covers only the
SyntheticSection and not the parent OutputSection. While not strictly
necessary for programs locating the exception tables via the symbols
it may cause ELF utilities that locate the exception tables via
the PT_ARM_EXIDX program header to fail. This does not seem to be the
case for GNU and LLVM readelf which seems to look for the
SHT_ARM_EXIDX section.

Differential Revision: https://reviews.llvm.org/D148033
2023-04-14 10:09:46 +01:00
Craig Topper
85444794cd [lld][RISCV] Implement GP relaxation for R_RISCV_HI20/R_RISCV_LO12_I/R_RISCV_LO12_S.
This implements support for relaxing these relocations to use the GP
register to compute addresses of globals in the .sdata and .sbss
sections.

This feature is off by default and must be enabled by passing
--relax-gp to the linker.

The GP register might not always be the "global pointer". It can
be used for other purposes. See discussion here
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/371

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D143673
2023-04-13 10:52:15 -07:00
Peter Smith
7ef47dd54e [LLD] Increase thunk pass limit
In issue 61250 https://github.com/llvm/llvm-project/issues/61250 there is
an example of a program that takes 17 passes to converge, which is 2 more
than the current limit of 15. Analysis of the program shows a particular
section that is made up of many roughly thunk sized chunks of code ending
in a call to a symbol that needs a thunk. Due to the positioning of the
section, at each pass a subset of the calls go out of range of their
original thunk, needing a new one created, which then pushes more thunks
out of range. This process eventually stops after 17 passes.

This patch is the simplest fix for the problem, which is to increase
the pass limit. I've chosen to double it which should comfortably
account for future cases like this, while only taking a few more
seconds to reach the limit in case of non-convergence.

As discussed in the issue, there could be some additional work done
to limit thunk reuse, this would potentially increase the number of
thunks in the program but should speed up convergence.

Differential Revision: https://reviews.llvm.org/D145786
2023-03-13 10:04:39 +00:00
Fangrui Song
07d0a4fb53 [ELF][RISCV] Make .sdata and .sbss closer
GNU ld's internal linker scripts for RISC-V place .sdata and .sbss close.
This makes GP relaxation more profitable.

While here, when .sbss is present, set `__bss_start` to the start of
.sbss instead of .bss, to match GNU ld.

Note: GNU ld's internal linker scripts have symbol assignments and input
section descriptions which are not relevant for modern systems. We only
add things that make sense.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D145118
2023-03-07 10:37:04 -08:00
Peter Collingbourne
82c2fcffc2 ELF: Respect MEMORY command when specified without a SECTIONS command.
We were previously ignoring the MEMORY command unless SECTIONS was also
specified. Fix it.

Differential Revision: https://reviews.llvm.org/D145132
2023-03-01 22:40:32 -08:00
Kazu Hirata
55e2cd1609 Use llvm::count{lr}_{zero,one} (NFC) 2023-01-28 12:41:20 -08:00
serge-sans-paille
984b800a03 Move from llvm::makeArrayRef to ArrayRef deduction guides - last part
This is a follow-up to https://reviews.llvm.org/D140896, split into
several parts as it touches a lot of files.

Differential Revision: https://reviews.llvm.org/D141298
2023-01-10 11:47:43 +01:00
Gregory Alfonso
d22f050e15 Remove redundant .c_str() and .get() calls
Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D139485
2022-12-18 00:33:53 +00:00
Fangrui Song
d98c172712 [ELF] Fix TimeTraceScope for "Finalize .eh_frame" 2022-12-03 18:00:51 +00:00
Guillaume Chatelet
08e2a76381 [lld][NFC] rename ELF alignment into addralign 2022-12-01 16:20:12 +00:00
Fangrui Song
910204cfbd [ELF] createSyntheticSections: simplify config->relocatable. NFC
We can add .riscv.attributes synthetic section here in the future.
2022-11-22 20:09:15 -08:00
Fangrui Song
8610cb0460 [ELF] -r: don't define _TLS_MODULE_BASE_
_TLS_MODULE_BASE_ is supposed to be defined by the final link. Defining it in a
relocatable link may render the final link value incorrect.

GNU ld i386/x86-64 have the same issue: https://sourceware.org/bugzilla/show_bug.cgi?id=29820
2022-11-22 12:59:45 -08:00
Fangrui Song
d9ef5574d4 [ELF] -r: don't define __global_pointer$
This symbol is supposed to be defined by the final executable link. The new
behavor matches GNU ld.
2022-11-22 12:37:51 -08:00
Fangrui Song
9015e41f0f [ELF] addRelIpltSymbols: make it explicit some passes are for non-relocatable links. NFC
and prepare for __global_pointer$ and _TLS_MODULE_BASE_ fix.
2022-11-22 11:38:57 -08:00
Fangrui Song
2bf5d86422 [ELF] Change rawData to content() and data() to contentMaybeDecompress()
Clarify data() which may trigger decompression and make it feasible to refactor
the member variable rawData.
2022-11-20 22:43:22 +00:00
Kazu Hirata
d42357007d [lld] Use llvm::reverse (NFC) 2022-11-06 08:39:41 -08:00
Fangrui Song
14f996dca8 [ELF] Move inputSections/ehInputSections into Ctx. NFC 2022-10-16 00:49:48 -07:00
Fangrui Song
1837333dac [ELF] --check-sections: allow address 0xffffffff for ELFCLASS32
Fix https://github.com/llvm/llvm-project/issues/58101
2022-10-01 15:37:07 -07:00
Fangrui Song
9c626d4a0d [ELF] Remove symtab indirection. NFC
Add LLVM_LIBRARY_VISIBILITY to remove unneeded GOT and unique_ptr indirection.
2022-10-01 14:46:49 -07:00
Fangrui Song
34fa860048 [ELF] Remove ctx indirection. NFC
Add LLVM_LIBRARY_VISIBILITY to remove unneeded GOT and unique_ptr
indirection. We can move other global variables into ctx without
indirection concern. In the long term we may consider passing Ctx
as a parameter to various functions and eliminate global state as
much as possible and then remove `Ctx::reset`.
2022-10-01 12:06:33 -07:00
Nico Weber
cd7ffa2e52 lld: Include name of output file in "failed to write output" diag
Differential Revision: https://reviews.llvm.org/D133110
2022-09-14 14:57:47 -04:00
Fangrui Song
12607f57da [ELF] Cache compute_thread_count. NFC 2022-09-12 19:09:08 -07:00
Fangrui Song
e6aebff674 [ELF] Parallelize relocation scanning
* Change `Symbol::flags` to a `std::atomic<uint16_t>`
* Add `llvm::parallel::threadIndex` as a thread-local non-negative integer
* Add `relocsVec` to part.relaDyn and part.relrDyn so that relative relocations can be added without a mutex
* Arbitrarily change -z nocombreloc to move relative relocations to the end. Disable parallelism for deterministic output.

MIPS and PPC64 use global states for relocation scanning. Keep serial scanning.

Speed-up with mimalloc and --threads=8 on an Intel Skylake machine:

* clang (Release): 1.27x as fast
* clang (Debug): 1.06x as fast
* chrome (default): 1.05x as fast
* scylladb (default): 1.04x as fast

Speed-up with glibc malloc and --threads=16 on a ThunderX2 (AArch64):

* clang (Release): 1.31x as fast
* scylladb (default): 1.06x as fast

Reviewed By: andrewng

Differential Revision: https://reviews.llvm.org/D133003
2022-09-12 12:56:35 -07:00
Fangrui Song
94ca041905 [ELF] Move scanRelocations into Relocations.cpp. NFC 2022-09-04 21:31:18 -07:00
Fangrui Song
3b4d800911 [ELF] Parallelize writes of different OutputSections
We currently process one OutputSection at a time and for each OutputSection
write contained input sections in parallel. This strategy does not leverage
multi-threading well. Instead, parallelize writes of different OutputSections.

The default TaskSize for parallelFor often leads to inferior sharding. We
prepare the task in the caller instead.

* Move llvm::parallel::detail::TaskGroup to llvm::parallel::TaskGroup
* Add llvm::parallel::TaskGroup::execute.
* Change writeSections to declare TaskGroup and pass it to writeTo.

Speed-up with --threads=8:

* clang -DCMAKE_BUILD_TYPE=Release: 1.11x as fast
* clang -DCMAKE_BUILD_TYPE=Debug: 1.10x as fast
* chrome -DCMAKE_BUILD_TYPE=Release: 1.04x as fast
* scylladb build/release: 1.09x as fast

On M1, many benchmarks are a small fraction of a percentage faster. Mozilla showed the largest difference with the patch being about 1.03x as fast.

Differential Revision: https://reviews.llvm.org/D131247
2022-08-24 09:40:03 -07:00
Sam Clegg
2cd4cd9a32 [lld][ELF] Rename SymbolTable::symbols() to SymbolTable::getSymbols(). NFC
This change renames this method match its original name and the name
used in the wasm linker.

Back in d8f8abbd4a the ELF SymbolTable
method `getSymbols()` was replaced with `forEachSymbol`.

Then in a2fc964417 `forEachSymbol` was
replaced with a `llvm::iterator_range`.

Then in e9262edf0d we came full circle
and the `llvm::iterator_range` was replaced with a `symbols()` accessor
that was identical the original `getSymbols()`.

`getSymbols` also matches the name used elsewhere in the ELF linker as
well as in both COFF and wasm backend (e.g. `InputFiles.h` and
`SyntheticSections.h`)

Differential Revision: https://reviews.llvm.org/D130787
2022-08-19 14:56:08 -07:00
Alex Brachet
dbd04b853b [ELF] Support --package-metadata
This was recently introduced in GNU linkers and it makes sense for
ld.lld to have the same support. This implementation omits checking if
the input string is valid json to reduce size bloat.

Differential Revision: https://reviews.llvm.org/D131439
2022-08-08 21:31:58 +00:00
Fangrui Song
c09d323599 [ELF] Move EhInputSection out of inputSections. NFC
inputSections temporarily contains EhInputSection objects mainly for
combineEhSections. Place EhInputSection objects into a new vector
ehInputSections instead of inputSections.
2022-07-31 11:58:08 -07:00
Fangrui Song
0a28cfdff5 [ELF] Simplify getRankProximity. NFC 2022-07-30 16:32:42 -07:00
Fangrui Song
2e2d5304f0 [ELF] Move combineEhSections from Writer to SyntheticSections. NFC
This not only places the function in the right place, but also allows inlining addSection.
2022-07-29 00:47:30 -07:00
Fangrui Song
c72973608d [ELF] Combine EhInputSection removal and MergeInputSection removal. NFC 2022-07-29 00:39:57 -07:00
Fangrui Song
8d4b11b4f1 [ELF] Remove redundant isa<InputSection>(sec). NFC
combineEhSections has been called to remove EhInputSection.
2022-07-29 00:30:52 -07:00
Fangrui Song
85cfd91723 [ELF] Optimize some non-constant alignTo with alignToPowerOf2. NFC
My x86-64 lld executable is 2KiB smaller. .eh_frame writing gets faster as there
were lots of divisions.
2022-07-24 11:20:49 -07:00
Fangrui Song
51b9e099d5 [ELF] Reword --no-allow-shlib-undefined diagnostic
Use a format more similar to unresolved references from regular object
files. It's probably easier to read for people who are less familiar
with the linker diagnostics.

Reviewed By: ikudrin

Differential Revision: https://reviews.llvm.org/D129790
2022-07-15 01:29:58 -07:00
YongKang Zhu
2324c2e3c3 [LLD] Two tweaks to symbol ordering scheme
When `--symbol-ordering-file` is specified, the linker today will always put
hot contributions in the middle of cold ones when targeting RISC machine, so
to minimize the chances that branch thunks need be generated for hot code
calling into cold code. This is not necessary when user specifies an ordering
of read-only data (vs. function) symbols, or when output section is small such
that no branch thunk would ever be required. The latter is common for mobile
apps. For example, among all the native ARM64 libraries in Facebook Instagram
App for Android, 80% of them have text section smaller than 64KB and the
largest text section seen is less than 8MB, well below the distance that a
BRANCH26 can reach.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D128382
2022-07-12 11:34:17 -07:00
Fangrui Song
6611d58f5b [ELF] Relax R_RISCV_ALIGN
Alternative to D125036. Implement R_RISCV_ALIGN relaxation so that we can handle
-mrelax object files (i.e. -mno-relax is no longer needed) and creates a
framework for future relaxation.

`relaxAux` is placed in a union with InputSectionBase::jumpInstrMod, storing
auxiliary information for relaxation. In the first pass, `relaxAux` is allocated.
The main data structure is `relocDeltas`: when referencing `relocations[i]`, the
actual offset is `r_offset - (i ? relocDeltas[i-1] : 0)`.

`relaxOnce` performs one relaxation pass. It computes `relocDeltas` for all text
section. Then, adjust st_value/st_size for symbols relative to this section
based on `SymbolAnchor`. `bytesDropped` is set so that `assignAddresses` knows
that the size has changed.

Run `relaxOnce` in the `finalizeAddressDependentContent` loop to wait for
convergence of text sections and other address dependent sections (e.g.
SHT_RELR). Note: extrating `relaxOnce` into a separate loop works for many cases
but has issues in some linker script edge cases.

After convergence, compute section contents: shrink the NOP sequence of each
R_RISCV_ALIGN as appropriate. Instead of deleting bytes, we run a sequence of
memcpy on the content delimitered by relocation locations. For R_RISCV_ALIGN let
the next memcpy skip the desired number of bytes. Section content computation is
parallelizable, but let's ensure the implementation is mature before
optimizations. Technically we can save a copy if we interleave some code with
`OutputSection::writeTo`, but let's not pollute the generic code (we don't have
templated relocation resolving, so using conditions can impose overhead to
non-RISCV.)

Tested:
`make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- LLVM=1 defconfig all` built Linux kernel using -mrelax is bootable.
FreeBSD RISCV64 system using -mrelax is bootable.
bash/curl/firefox/libevent/vim/tmux using -mrelax works.

Differential Revision: https://reviews.llvm.org/D127581
2022-07-07 10:16:09 -07:00
Fangrui Song
e0612c91cd [ELF] Optimize getInputSections. NFC
In the majority of cases (e.g. orphan sections), an OutputSection has at most
one InputSectionDescription (isd). By changing the return type to
ArrayRef<InputSection *> we can just reference the isd->sections. For
OutputSections with more than one InputSectionDescription we use a caller
provided SmallVector to copy the elements as before.

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D129111
2022-07-05 23:31:09 -07:00
Fangrui Song
9a572164d5 [ELF] Move InputFiles global variables (memoryBuffers, objectFiles, etc) into Ctx. NFC 2022-06-29 18:53:38 -07:00
Nico Weber
7effcbda49 Rename parallelForEachN to just parallelFor
Patch created by running:

  rg -l parallelForEachN | xargs sed -i '' -c 's/parallelForEachN/parallelFor/'

No behavior change.

Differential Revision: https://reviews.llvm.org/D128140
2022-06-19 17:49:00 -04:00
Fangrui Song
2ac8ce5d56 Revert D125410 "[ELF] Align the end of PT_GNU_RELRO to max-page-size instead of common-page-size"
This reverts commit ebdb9d635a.

Changing p_memsz is insufficient and may make PT_GNU_RELRO extend beyond the
PT_LOAD.
2022-05-12 20:41:22 -07:00
Fangrui Song
ebdb9d635a [ELF] Align the end of PT_GNU_RELRO to max-page-size instead of common-page-size
We picked common-page-size to match GNU ld. Recently, the resolution to GNU ld
https://sourceware.org/bugzilla/show_bug.cgi?id=28824 (milestone: 2.39) switched
to max-page-size so that the last page can be protected by RELRO in case the
system page size is larger than common-page-size.

Thanks to our two RW PT_LOAD scheme (D58892), switching to max-page-size does
not change file size (while GNU ld's scheme may increase file size).

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D125410
2022-05-12 11:03:12 -07:00
Fangrui Song
5a44980f0a [ELF] Support custom sections between DATA_SEGMENT_ALIGN and DATA_SEGMENT_RELRO_END
We currently hard code RELRO sections. When a custom section is between
DATA_SEGMENT_ALIGN and DATA_SEGMENT_RELRO_END, we may report a spurious
`error: section: ... is not contiguous with other relro sections`. GNU ld
makes such sections RELRO.

glibc recently switched to default --with-default-link=no. This configuration
places `__libc_atexit` and others between DATA_SEGMENT_ALIGN and
DATA_SEGMENT_RELRO_END. This patch allows such a ld.bfd --verbose
linker script to be fed into lld.

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D124656
2022-05-04 01:10:46 -07:00
Fangrui Song
be01af4a0f [ELF] Fix non-relocatable-non-emit-relocs --gc-sections to discard .L symbols
This reverts commit 764cd491b1, which I
incorrectly assumed NFC partly because there were no test coverage for the
non-relocatable non-emit-relocs case before 9d6d936243fe343abe89323a27c7241b395af541.

The interaction of {,-r,--emit-relocs} {,--discard-locals} {,--gc-sections} is
complex but without -r/--emit-relocs, --gc-sections does need to discard .L
symbols like --no-gc-sections. The behavior matches GNU ld.
2022-04-07 14:34:32 -07:00
Mitch Phillips
786c89fed3 [ELF][MTE] Add --android-memtag-* options to synthesize ELF notes
This ELF note is aarch64 and Android-specific. It specifies to the
dynamic loader that specific work should be scheduled to enable MTE
protection of stack and heap regions.

Current synthesis of the ".note.android.memtag" ELF note is done in the
Android build system. We'd like to move that to the compiler. This patch
adds the --memtag-stack, --memtag-heap, and --memtag-mode={async, sync,
none} flags to the linker, which synthesises the note for us.

Future changes will add -fsanitize=memtag* flags to clang which will
pass these through to lld.

Depends on D119381.

Differential Revision: https://reviews.llvm.org/D119384
2022-04-04 11:17:36 -07:00