This adds support for the LoongArch ELF psABI v2.00 [1] relocation
model to LLD. The deprecated stack-machine-based psABI v1 relocs are not
supported.
The code is tested by successfully bootstrapping a Gentoo/LoongArch
stage3, complete with common GNU userland tools and both the LLVM and
GNU toolchains (GNU toolchain is present only for building glibc,
LLVM+Clang+LLD are used for the rest). Large programs like QEMU are
tested to work as well.
[1]: https://loongson.github.io/LoongArch-Documentation/LoongArch-ELF-ABI-EN.html
Reviewed By: MaskRay, SixWeining
Differential Revision: https://reviews.llvm.org/D138135
That patch adds a check for threadIndex being used with only threads
created by ThreadPoolExecutor. This helps catch two types of errors:
1. If a thread is created not by ThreadPoolExecutor its index may clash
with the index of another thread. Using threadIndex, in that case, may
lead to a data race.
2. Index of the main thread(threadIndex == 0) currently clashes with
the index of thread0 in ThreadPoolExecutor threads. That may lead
to a data race if main thread and thread0 are executed concurrently.
This patch allows execution tasks on the main thread only in case
parallel::strategy.ThreadsRequested == 1. In all other cases,
assertions check that threadIndex != UINT_MAX(i.e. that task
is executed on a thread created by ThreadPoolExecutor).
Differential Revision: https://reviews.llvm.org/D148916
This patch allows to specify that some part of tasks should be
done in sequential order. It makes it possible to not use
condition operator for separating sequential tasks:
TaskGroup tg;
for () {
if(condition) ==> tg.spawn([](){fn();}, condition)
fn();
else
tg.spawn([](){fn();});
}
It also prevents execution on main thread. Which allows adding
checks for getThreadIndex() function discussed in D142318.
The patch also replaces std::stack with std::deque in the
ThreadPoolExecutor to have natural execution order in case
(parallel::strategy.ThreadsRequested == 1).
Differential Revision: https://reviews.llvm.org/D148728
D73518 mentioned non-STT_SECTION symbol names. This patch extends the code to
handle STT_SECTION symbols, where we report the section name.
This change helps at least the following cases with very little code.
* Whether a out-of-range relocation is due to code or data.
* For a relocation in .debug_info, which referenced `.debug_*` section (due to DWARF32 limitation) causes the problem.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D145199
Fix https://github.com/llvm/llvm-project/issues/60392
```
// a.cc
void raise() { throw 42; }
bool foo() {
try { raise(); } catch (int) { return true; }
return false;
}
int main() { foo(); }
```
```
clang++ --target=x86_64-linux-gnu -fno-pic -mcmodel=large -no-pie -fuse-ld=lld -z notext a.cc -o a && ./a
clang++ --target=aarch64-linux-gnu -fno-pic -no-pie -fuse-ld=lld -Wl,--dynamic-linker=/usr/aarch64-linux-gnu/lib/ld-linux-aarch64.so.1 -Wl,-rpath=/usr/aarch64-linux-gnu/lib -z notext a.cc -o a && ./a
```
Both commands fail because we produce a dynamic relocation for
R_X86_64_64/R_AARCH64_ABS64 in .eh_frame which will be adjusted to a wrong
offset by `SectionBase::getOffset` after D122459.
Since GNU ld uses a canonical PLT entry instead of a dynamic relocation for
.eh_frame, we follow suit as well to avoid the issue.
Mips has an ABI issue (https://github.com/llvm/llvm-project/issues/5837) and we
don't implement GNU ld's DW_EH_PE_absptr conversion. mips64-eh-abs-reloc.s wants
a dynamic relocation, so keep the original behavior for EM_MIPS.
Differential Revision: https://reviews.llvm.org/D143136
to prepare for changing `relocations` from a SmallVector to a pointer.
Also change the `isec` parameter in `addAddendOnlyRelocIfNonPreemptible` to `GotSection &`.
Add LLVM_LIBRARY_VISIBILITY to remove unneeded GOT and unique_ptr
indirection. We can move other global variables into ctx without
indirection concern. In the long term we may consider passing Ctx
as a parameter to various functions and eliminate global state as
much as possible and then remove `Ctx::reset`.
`config` has 1000+ uses so we try to avoid changing `config->foo`. Define a
wrapper with LLVM_LIBRARY_VISIBILITY to remove unneeded GOT and unique_ptr
indirection.
My x86-64 lld executable is 11+KiB smaller.
Symbol::replace intends to overwrite a few fields (mostly Elf{32,64}_Sym
fields), but the implementation copies all fields then restores some old fields.
This is error-prone and wasteful. Add Symbol::overwrite to copy just the
needed fields and add other overwrite member functions to copy the extra
fields.
https://reviews.llvm.org/D133003#3806508 can reproduce a non-determinism with
--threads=4. Making the config serial fixes non-determinism (by running the link
many times and compare output).
On Unix platforms, this wrapper function is inline, so it should
expand to the same direct access to the thread local variable. On
Windows, it's a non-inline function within Parallel.cpp, allowing
making the thread_local variable static.
Windows Native TLS doesn't support direct access to thread local
variables in a different DLL, and GCC/binutils on Windows occasionally
has problems with non-static thread local variables too.
This fixes mingw dylib builds with native TLS after
e6aebff674.
At the same time, move the whole thread local variable within
#if LLVM_ENABLE_THREADS
to fix builds without threading support.
Differential Revision: https://reviews.llvm.org/D133759
* Change `Symbol::flags` to a `std::atomic<uint16_t>`
* Add `llvm::parallel::threadIndex` as a thread-local non-negative integer
* Add `relocsVec` to part.relaDyn and part.relrDyn so that relative relocations can be added without a mutex
* Arbitrarily change -z nocombreloc to move relative relocations to the end. Disable parallelism for deterministic output.
MIPS and PPC64 use global states for relocation scanning. Keep serial scanning.
Speed-up with mimalloc and --threads=8 on an Intel Skylake machine:
* clang (Release): 1.27x as fast
* clang (Debug): 1.06x as fast
* chrome (default): 1.05x as fast
* scylladb (default): 1.04x as fast
Speed-up with glibc malloc and --threads=16 on a ThunderX2 (AArch64):
* clang (Release): 1.31x as fast
* scylladb (default): 1.06x as fast
Reviewed By: andrewng
Differential Revision: https://reviews.llvm.org/D133003
This simplifies SymbolTableSection<ELFT>::writeTo. Add dsoProtected to be used
in canDefineSymbolInExecutable and get the side benefit that the protected DSO
preemption diagnostic is clearer.
This change renames this method match its original name and the name
used in the wasm linker.
Back in d8f8abbd4a the ELF SymbolTable
method `getSymbols()` was replaced with `forEachSymbol`.
Then in a2fc964417 `forEachSymbol` was
replaced with a `llvm::iterator_range`.
Then in e9262edf0d we came full circle
and the `llvm::iterator_range` was replaced with a `symbols()` accessor
that was identical the original `getSymbols()`.
`getSymbols` also matches the name used elsewhere in the ELF linker as
well as in both COFF and wasm backend (e.g. `InputFiles.h` and
`SyntheticSections.h`)
Differential Revision: https://reviews.llvm.org/D130787
Some tests (e.g. aarch64-feature-pac.s) segfault in libstdc++ _GLIBCXX_DEBUG
builds (enabled by LLVM_ENABLE_EXPENSIVE_CHECKS).
dyn_cast<ThunkSection> is incorrectly true for any SyntheticSection. std::merge
transitively calls mergeCmp(x, x) (due to __glibcxx_requires_irreflexive_pred)
and will segfault in `ta->getTargetInputSection()`. The dyn_cast<ThunkSection>
issue should be eventually fixed properly, bug `a != b` is robust enough for now.
This simplifies code, removes a read32 (for id==0 check), and makes it feasible
to combine some operations in EhInputSection::split and EhFrameSection::addRecords.
Mostly NFC, but fixes "Relocation not in any piece" assertion failure in an
erroneous case when a relocation offset precedes all CIE/FDE pices.
Alternative to D125036. Implement R_RISCV_ALIGN relaxation so that we can handle
-mrelax object files (i.e. -mno-relax is no longer needed) and creates a
framework for future relaxation.
`relaxAux` is placed in a union with InputSectionBase::jumpInstrMod, storing
auxiliary information for relaxation. In the first pass, `relaxAux` is allocated.
The main data structure is `relocDeltas`: when referencing `relocations[i]`, the
actual offset is `r_offset - (i ? relocDeltas[i-1] : 0)`.
`relaxOnce` performs one relaxation pass. It computes `relocDeltas` for all text
section. Then, adjust st_value/st_size for symbols relative to this section
based on `SymbolAnchor`. `bytesDropped` is set so that `assignAddresses` knows
that the size has changed.
Run `relaxOnce` in the `finalizeAddressDependentContent` loop to wait for
convergence of text sections and other address dependent sections (e.g.
SHT_RELR). Note: extrating `relaxOnce` into a separate loop works for many cases
but has issues in some linker script edge cases.
After convergence, compute section contents: shrink the NOP sequence of each
R_RISCV_ALIGN as appropriate. Instead of deleting bytes, we run a sequence of
memcpy on the content delimitered by relocation locations. For R_RISCV_ALIGN let
the next memcpy skip the desired number of bytes. Section content computation is
parallelizable, but let's ensure the implementation is mature before
optimizations. Technically we can save a copy if we interleave some code with
`OutputSection::writeTo`, but let's not pollute the generic code (we don't have
templated relocation resolving, so using conditions can impose overhead to
non-RISCV.)
Tested:
`make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- LLVM=1 defconfig all` built Linux kernel using -mrelax is bootable.
FreeBSD RISCV64 system using -mrelax is bootable.
bash/curl/firefox/libevent/vim/tmux using -mrelax works.
Differential Revision: https://reviews.llvm.org/D127581
Similar to D117734. Take AArch64 as an example when the branch range is +-0x8000000.
getISDThunkSec returns `ts` when `src-0x8000000-r_addend <= tsBase < src-0x8000000`
and the new thunk will be placed in `ts` (`ts->addThunk(t)`). However, the new
thunk (at the end of ts) may be unreachable from src. In the next pass,
`normalizeExistingThunk` reverts the relocation back to the original target.
Then a new thunk is created and the same `ts` is picked as before. The `ts` is
still unreachable.
I have observed it in one test with a sufficiently large r_addend (47664): there
are initially 245 Thunk's, then in each pass 14 new Thunk's are created and get
appended to the unreachable ThunkSection. After 15 passes lld fails with
`thunk creation not converged`.
The new test aarch64-thunk-reuse2.s checks the case.
Without `- pcBias`, arm-thumb-thunk-empty-pass.s and arm-thunk-multipass-plt.s
will fail.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D124653
https://discourse.llvm.org/t/parallel-input-file-parsing/60164
initializeSymbols currently sets Defined::section and handles non-prevailing
COMDAT groups. Move the code to the parallel postParse to reduce work from the
single-threading code path and make parallel section initialization infeasible.
Postpone reporting duplicate symbol errors so that the messages have the
section information. (`Defined::section` is assigned in postParse and another
thread may not have the information).
* duplicated-synthetic-sym.s: BinaryFile duplicate definition (very rare) now
has no section information
* comdat-binding: `%t/w.o %t/g.o` leads to an undesired undefined symbol. This
is not ideal but we report a diagnostic to inform that this is unsupported.
(See release note)
* comdat-discarded-lazy.s: %tdef.o is unextracted. The new behavior (discarded
section error) makes more sense
* i386-comdat.s: switched to a better approach working around
.gnu.linkonce.t.__x86.get_pc_thunk.bx in glibc<2.32 for x86-32.
Drop the ancient no-longer-relevant workaround for __i686.get_pc_thunk.bx
Depends on D120640
Differential Revision: https://reviews.llvm.org/D120626
https://discourse.llvm.org/t/parallel-input-file-parsing/60164
initializeSymbols currently sets Defined::section and handles non-prevailing
COMDAT groups. Move the code to the parallel postParse to reduce work from the
single-threading code path and make parallel section initialization infeasible.
Postpone reporting duplicate symbol errors so that the messages have the
section information. (`Defined::section` is assigned in postParse and another
thread may not have the information).
* duplicated-synthetic-sym.s: BinaryFile duplicate definition (very rare) now
has no section information
* comdat-binding: `%t/w.o %t/g.o` leads to an undesired undefined symbol. This
is not ideal but we report a diagnostic to inform that this is unsupported.
(See release note)
* comdat-discarded-lazy.s: %tdef.o is unextracted. The new behavior (discarded
section error) makes more sense
Depends on D120640
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D120626
Symbol.h depends on InputFiles.h. This change moves us toward dropping the
weird dependency.
The call sites will become slightly uglier (`cast<SharedFile>(s->file)`), but
the compromise is acceptable.
In many call sites we know uncompression cannot happen (non-SHF_ALLOC, or the
data (even if compressed) must have been uncompressed by a previous pass).
Prefer rawData in these cases. data() increases code size and prevents
optimization on rawData.