Currently we take the first SHT_RISCV_ATTRIBUTES (.riscv.attributes) as the
output. If we link an object without an extension with an object with the
extension, the output Tag_RISCV_arch may not contain the extension and some
tools like objdump -d will not decode the related instructions.
This patch implements
Tag_RISCV_stack_align/Tag_RISCV_arch/Tag_RISCV_unaligned_access merge as
specified by
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-elf.adoc#attributes
For the deprecated Tag_RISCV_priv_spec{,_minor,_revision}, dump the attribute to
the output iff all input agree on the value. This is different from GNU ld but
our simple approach should be ok for deprecated tags.
`RISCVAttributeParser::handler` currently warns about unknown tags. This
behavior is retained. In GNU ld arm, tags >= 64 (mod 128) are ignored with a
warning. If RISC-V ever wants to do something similar
(https://github.com/riscv-non-isa/riscv-elf-psabi-doc/issues/352), consider
documenting it in the psABI and changing RISCVAttributeParser.
Like GNU ld, zero value integer attributes and empty string attributes are not
dumped to the output.
Reviewed By: asb, kito-cheng
Differential Revision: https://reviews.llvm.org/D138550
Allowing incorrect version scripts is not a helpful default. Flip that
to help users find their bugs at build time rather than at run time.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D135402
https://github.com/riscv/riscv-elf-psabi-doc/pull/190 introduced STO_RISCV_VARIANT_CC.
The linker should:
* Copy the STO_RISCV_VARIANT_CC bit to .symtab/.dynsym: already fulfilled after
82ed93ea05
* Produce DT_RISCV_VARIANT_CC if at least one R_RISCV_JUMP_SLOT relocation
references a symbol with the STO_RISCV_VARIANT_CC bit. Done by this patch.
Reviewed By: kito-cheng
Differential Revision: https://reviews.llvm.org/D107951
Solve two issues that showed up when using LLD with Unreal Engine & FASTBuild:
1. It seems the S_OBJNAME record doesn't always record the "precomp signature". We were relying on that to match the PCH.OBJ with their dependent-OBJ.
2. MSVC link.exe is able to link a PCH.OBJ when the "precomp signatureÈ doesn't match, but LLD was failing. This was occuring since the Unreal Engine Build Tool was compiling the PCH.OBJ, but the dependent-OBJ were compiled & cached through FASTBuild. Upon a clean rebuild, the PCH.OBJs were recompiled by the Unreal Build Tool, thus the "precomp signatures" were changing; however the OBJs were already cached by FASTBuild, thus having an old "precomp signatures".
We now ignore "precomp signatures" and properly fallback to cmd-line name lookup, like MSVC link.exe does, and only fail if the PCH.OBJ type stream doesn't match the count expected by the dependent-OBJ.
Differential Revision: https://reviews.llvm.org/D136762
Previously, we used SHA-1 for hashing the CodeView type records.
SHA-1 in `GloballyHashedType::hashType()` is coming top in the profiles. By simply replacing with BLAKE3, the link time is reduced in our case from 15 sec to 13 sec. I am only using MSVC .OBJs in this case. As a reference, the resulting .PDB is approx 2.1GiB and .EXE is approx 250MiB.
Differential Revision: https://reviews.llvm.org/D137101
MSVC records the command line arguments in S_ENVBLOCK, skipping the input file arguments.
This patch adds this filtering on lld-link side.
Differential Revision: https://reviews.llvm.org/D137723
Allowing incorrect version scripts is not a helpful default. Flip that
to help users find their bugs at build time rather than at run time.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D135402
Mach-O ld64 supports -w to suppress warnings. GNU ld 2.40 will support the
option as well (https://sourceware.org/bugzilla/show_bug.cgi?id=29654).
This feature has some small value. E.g. when analyzing a large executable with
relocation overflow issues, we may use --noinhibit-exec --emit-relocs to get an
output file with static relocations despite relocation overflow issues. -w can
significantly improve the link time as printing the massive warnings is slow.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D136569
This reverts commit 096f93e73d.
Revert "[Libomptarget] Make the plugins ingore undefined exported symbols"
This reverts commit 3f62314c23.
Revert "[LLD] Enable --no-undefined-version by default."
This reverts commit 7ec8b0d162.
Three commits are reverted because of the current omp build fail
with GNU ld. See discussion here: https://reviews.llvm.org/rG096f93e73dc3
Allowing incorrect version scripts is not a helpful default. Flip that
to help users find their bugs at build time rather than at run time.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D135402
`clang -gz=zstd a.o` passes this option to the linker. This option compresses output
debug sections with zstd and sets ch_type to ELFCOMPRESS_ZSTD. As of today, very
few DWARF consumers recognize ELFCOMPRESS_ZSTD.
Use the llvm::zstd::compress API with level llvm::zstd::DefaultCompression (5),
which we may tune after we have more experience with zstd output.
zstd has built-in parallel compression support (so we don't need to do D117853
for zlib), which is not leveraged yet.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D133548
so that lld accepts relocatable object files produced by `clang -c -g -gz=zstd`.
We don't want to increase the size of InputSection, so do redundant but cheap
ch_type checks instead.
Differential Revision: https://reviews.llvm.org/D129406
These will be LLD-specific options to support Control Flow Guard for the
MinGW target. They are disabled by default, but enabling `--guard-cf`
will also enable `--guard-longjmp` unless `--no-guard-longjmp` is also
specified. These options maps to `-guard:cf,[no]longjmp`.
Note that these features require the `_load_config_used` symbol to
contain the load config directory and be filled with the required
symbols. While current versions of mingw-w64 do not supply this symbol,
the user can provide their own version of it.
Reviewed By: MaskRay, rnk
Differential Revision: https://reviews.llvm.org/D132808
This is an entirely new embedded directive - extending the GNU ld
command line option --exclude-symbols to be usable in embedded
directives too.
(GNU ld.bfd also got support for the same new directive, currently in
the latest git version, after the 2.39 branch.)
This works as an inverse to the regular embedded dllexport directives,
for cases when autoexport of all eligible symbols is performed.
Differential Revision: https://reviews.llvm.org/D130120
This adds support for the existing GNU ld command line option, which
allows excluding individual symbols from autoexport (when linking a
DLL and no symbols are marked explicitly as dllexported).
Differential Revision: https://reviews.llvm.org/D130118
This was recently introduced in GNU linkers and it makes sense for
ld.lld to have the same support. This implementation omits checking if
the input string is valid json to reduce size bloat.
Differential Revision: https://reviews.llvm.org/D131439
This just removes the code that gates the logic. The main issue here is
perf impact: without {D122258}, LLD takes a significant perf hit because
it now has to do a lot more work in the input parsing phase. But with
that change to eliminate unnecessary EH frames from input object files,
the perf overhead here is minimal. Concretely, here are the numbers for
some builds as measured on my 16-core Mac Pro:
**chromium_framework**
This is without the use of `-femit-dwarf-unwind=no-compact-unwind`:
base diff difference (95% CI)
sys_time 1.826 ± 0.019 1.962 ± 0.034 [ +6.5% .. +8.4%]
user_time 9.306 ± 0.054 9.926 ± 0.082 [ +6.2% .. +7.1%]
wall_time 8.225 ± 0.068 8.947 ± 0.128 [ +8.0% .. +9.6%]
samples 15 22
With that flag enabled, the regression mostly disappears, as hoped:
base diff difference (95% CI)
sys_time 1.839 ± 0.062 1.866 ± 0.068 [ -0.9% .. +3.8%]
user_time 9.452 ± 0.068 9.490 ± 0.067 [ -0.1% .. +0.9%]
wall_time 8.383 ± 0.127 8.452 ± 0.114 [ -0.1% .. +1.8%]
samples 17 21
**Unnamed internal app**
Without `-femit-dwarf-unwind`, this is the perf hit:
base diff difference (95% CI)
sys_time 1.372 ± 0.029 1.317 ± 0.024 [ -4.6% .. -3.5%]
user_time 2.835 ± 0.028 2.980 ± 0.027 [ +4.8% .. +5.4%]
wall_time 3.205 ± 0.079 3.383 ± 0.066 [ +4.9% .. +6.2%]
samples 102 83
With `-femit-dwarf-unwind`, the perf hit almost disappears:
base diff difference (95% CI)
sys_time 1.274 ± 0.026 1.270 ± 0.025 [ -0.9% .. +0.3%]
user_time 2.812 ± 0.023 2.822 ± 0.035 [ +0.1% .. +0.7%]
wall_time 3.166 ± 0.047 3.174 ± 0.059 [ -0.2% .. +0.7%]
samples 95 97
Just for fun, I measured the impact of `-femit-dwarf-unwind` on ld64
(`base` has the extra DWARF unwind info in the input object files,
`diff` doesn't):
base diff difference (95% CI)
sys_time 1.128 ± 0.010 1.124 ± 0.023 [ -1.3% .. +0.6%]
user_time 7.176 ± 0.030 7.106 ± 0.094 [ -1.5% .. -0.4%]
wall_time 7.874 ± 0.041 7.795 ± 0.121 [ -1.7% .. -0.3%]
samples 16 25
And for LLD:
base diff difference (95% CI)
sys_time 1.315 ± 0.019 1.280 ± 0.019 [ -3.2% .. -2.0%]
user_time 2.980 ± 0.022 2.822 ± 0.016 [ -5.5% .. -5.0%]
wall_time 3.369 ± 0.038 3.175 ± 0.033 [ -6.2% .. -5.3%]
samples 47 47
So parsing the extra EH frames is a lot more expensive for us than for
ld64. But given that we are quite a lot faster than ld64 to begin with,
I guess this isn't entirely unexpected...
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D129540
Add FORCE_LLD_DIAGNOSTICS_CRASH inspired by the existing
FORCE_CLANG_DIAGNOSTICS_CRASH.
This is particularly useful for people customizing LLD as they may
want to modify the crash reporting behavior.
Differential Revision: https://reviews.llvm.org/D128195
`--time-trace=foo` has the same behavior as `--time-trace --time-trace-file=<file>`
had previously.
Also, for mac, make --time-trace-granularity *not* imply --time-trace, to match
behavior of the ELF port.
Differential Revision: https://reviews.llvm.org/D128451
.zdebug is unlikely used any longer: gcc -gz switched from legacy
.zdebug to SHF_COMPRESSED with binutils 2.26 (2016), which has been
several years. clang 14 dropped -gz=zlib-gnu support. According to
Debian Code Search (`gz=zlib-gnu`), no project uses -gz=zlib-gnu.
Remove .zdebug support to (a) simplify code and (b) allow removal of llvm-mc's
--compress-debug-sections=zlib-gnu.
In case the old object file `a.o` uses .zdebug, run `objcopy --decompress-debug-sections a.o`
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D126793
We picked common-page-size to match GNU ld. Recently, the resolution to GNU ld
https://sourceware.org/bugzilla/show_bug.cgi?id=28824 (milestone: 2.39) switched
to max-page-size so that the last page can be protected by RELRO in case the
system page size is larger than common-page-size.
Thanks to our two RW PT_LOAD scheme (D58892), switching to max-page-size does
not change file size (while GNU ld's scheme may increase file size).
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D125410
D86142 introduced --fortran-common and defaulted it to true (matching GNU ld
but deviates from gold/macOS ld64). The default state was motivated by transparently
supporting some FORTRAN 77 programs (Fortran 90 deprecated common blocks).
Now I think it again. I believe we made a mistake to change the default:
* this is a weird and legacy rule, though the breakage is very small
* --fortran-common introduced complexity to parallel symbol resolution and will slow down it
* --fortran-common more likely causes issues when users mix COMMON and
STB_GLOBAL definitions (see https://github.com/llvm/llvm-project/issues/48570 and
https://maskray.me/blog/2022-02-06-all-about-common-symbols).
I have seen several issues in our internal projects and Android.
On the other hand, --no-fortran-common is safer since
COMMON/STB_GLOBAL have the same semantics related to archive member extraction.
Therefore I think we should switch back, not punishing the common uage.
A platform wanting --fortran-common can implement ld.lld as a shell script
wrapper around `lld -flavor gnu --fortran-common "$@"`.
Reviewed By: ikudrin, sfertile
Differential Revision: https://reviews.llvm.org/D122450
https://discourse.llvm.org/t/parallel-input-file-parsing/60164
initializeSymbols currently sets Defined::section and handles non-prevailing
COMDAT groups. Move the code to the parallel postParse to reduce work from the
single-threading code path and make parallel section initialization infeasible.
Postpone reporting duplicate symbol errors so that the messages have the
section information. (`Defined::section` is assigned in postParse and another
thread may not have the information).
* duplicated-synthetic-sym.s: BinaryFile duplicate definition (very rare) now
has no section information
* comdat-binding: `%t/w.o %t/g.o` leads to an undesired undefined symbol. This
is not ideal but we report a diagnostic to inform that this is unsupported.
(See release note)
* comdat-discarded-lazy.s: %tdef.o is unextracted. The new behavior (discarded
section error) makes more sense
* i386-comdat.s: switched to a better approach working around
.gnu.linkonce.t.__x86.get_pc_thunk.bx in glibc<2.32 for x86-32.
Drop the ancient no-longer-relevant workaround for __i686.get_pc_thunk.bx
Depends on D120640
Differential Revision: https://reviews.llvm.org/D120626
https://discourse.llvm.org/t/parallel-input-file-parsing/60164
initializeSymbols currently sets Defined::section and handles non-prevailing
COMDAT groups. Move the code to the parallel postParse to reduce work from the
single-threading code path and make parallel section initialization infeasible.
Postpone reporting duplicate symbol errors so that the messages have the
section information. (`Defined::section` is assigned in postParse and another
thread may not have the information).
* duplicated-synthetic-sym.s: BinaryFile duplicate definition (very rare) now
has no section information
* comdat-binding: `%t/w.o %t/g.o` leads to an undesired undefined symbol. This
is not ideal but we report a diagnostic to inform that this is unsupported.
(See release note)
* comdat-discarded-lazy.s: %tdef.o is unextracted. The new behavior (discarded
section error) makes more sense
Depends on D120640
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D120626
GNU ld 2.38 added -z pack-relative-relocs which is similar to
--pack-dyn-relocs=relr but synthesizes the `GLIBC_ABI_DT_RELR` version
dependency if a shared object named `libc.so.*` has a `GLIBC_2.*` version
dependency.
This is used to implement the (as some glibc folks call) version lockout
mechanism. Add this option, because glibc does not want to support
--pack-dyn-relocs=relr which does not add `GLIBC_ABI_DT_RELR`.
See https://maskray.me/blog/2021-10-31-relative-relocations-and-relr for
detail.
Close https://github.com/llvm/llvm-project/issues/53775
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D120701
This relands 73e585e44d (and 0574b5fc65), with a fix for
the failing test (by using Optional<StringRef>s instead of
making StringRef::empty() mean absence of value).
Differential Revision: https://reviews.llvm.org/D118070
Makes lld-link work in a non-MSVC shell by autodetecting MSVC toolchain. Also
adds support for /winsysroot and a few other switches.
All this is done by refactoring to share code with clang-cl's existing support
for the same.
Differential Revision: https://reviews.llvm.org/D118070
https://maskray.me/blog/2022-02-06-all-about-common-symbols#no-define-common
In GNU ld, -dc only affects -r links and causes COMMON symbols to be allocated.
--no-define-common is defined to make COMMON symbols undefined for -shared.
AIUI --no-define-common is a workaround around glibc 2.1 time and not really useful.
gold confuses --define-common with -d/FORCE_COMMON_ALLOCATION and implements
--define-common with -d semantics. Its --no-define-common is incompatible with
GNU ld.
In ld.lld, b2a23cf3c0 fixed the default -r
behavior for COMMON symbols but ported the incompatible gold
--[no-]define-common. To the best of my knowledge, no project uses -dp
--[no-]define-common. So just remove these options.
-d/-dc are used by the following projects:
* grub grub-core/genmod.sh.in uses -Wl,-r,-d (https://lists.gnu.org/archive/html/grub-devel/2022-02/msg00088.html)
* FreeBSD crunchgen uses -Wl,-dc (https://reviews.freebsd.org/D34215)
A no-op implementation works for them. Only when a program inspects relocatable
output by itself and does not recognize COMMON symbols, there may be a problem.
This is an extremely unlikely case.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D119108
This updates all the non-runtime project release notes to use the
version number from CMake instead of the hard-coded version numbers
in conf.py.
It also hides warnings about pre-releases when the git suffix
is dropped from the LLVM version in CMake.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D112181
The deduplication requires a DenseMap of the same size of the local part of
.strtab . I optimized it in e205445434 but it is
still quite slow.
For Release build of clang, deduplication makes .strtab 1.1% smaller and makes the link 3% slower.
For chrome, deduplication makes .strtab 0.1% smaller and makes the link 6% slower.
I suggest that we only perform the optimization with -O2 (default is -O1).
Not deduplicating local symbol names will simplify parallel symbol table write.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D118577
The current TLSDESC optimization code assumes:
```
leaq x@tlsdesc(%rip), %rax
call *x@tlscall(%rax) # adjacent
```
From https://gitlab.freedesktop.org/mesa/mesa/-/issues/5665 , it seems that the
two instructions may not be adjacent in GCC 10's output:
```
leaq x@tlsdesc(%rip), %rax
something else
call *x@tlscall(%rax)
```
This patch supports the case. While here, support non-RAX registers for
R_X86_64_GOTPC32_TLSDESC, in case the compiler generates inefficient:
```
leaq x@tlsdesc(%rip), %rcx # or %rdx, %rbx, %rdi, ...
movq %rcx, %rax
call *x@tlscall(%rax) # GNU ld/gold error for non-RAX
```
Differential Revision: https://reviews.llvm.org/D114416
This brings back the original version of D81359.
I have found several use cases now.
* Unlike GNU ld, LLD's relocation processing is one pass. If we decide to
optimize(relax) R_X86_64_{,REX_}GOTPCRELX, we will suppress GOT generation and
cannot undo the decision later. Optimizing R_X86_64_REX_GOTPCRELX can usually
make it easy to hit `relocation R_X86_64_REX_GOTPCRELX out of range` because
the distance to GOT is usually shorter. Without --no-relax, the user has to
recompile with `-Wa,-mrelax-relocations=no`.
* The option would help during my investigationg of the root cause of https://git.kernel.org/linus/09e43968db40c33a73e9ddbfd937f46d5c334924
* There is need for relaxation for AArch64 & RISC-V. Implementing this for
x86-64 improves consistency with little target-specific cost (two-line
X86_64.cpp change).
Reviewed By: alexander-shaposhnikov
Differential Revision: https://reviews.llvm.org/D113615
Similar to D69607 but for archive member extraction unrelated to GC. This patch adds --why-extract=.
Prior art:
GNU ld -M prints
```
Archive member included to satisfy reference by file (symbol)
a.a(a.o) main.o (a)
b.a(b.o) (b())
```
-M is mainly for input section/symbol assignment <-> output section mapping
(often huge output) and the information may appear ad-hoc.
Apple ld64
```
__Z1bv forced load of b.a(b.o)
_a forced load of a.a(a.o)
```
It doesn't say the reference file.
Arm's proprietary linker
```
Selecting member vsnprintf.o(c_wfu.l) to define vsnprintf.
...
Loading member vsnprintf.o from c_wfu.l.
definition: vsnprintf
reference : _printf_a
```
---
--why-extract= gives the user the full data (which is much shorter than GNU ld
-Map). It is easy to track a chain of references to one archive member with a
one-liner, e.g.
```
% ld.lld main.o a_b.a b_c.a c.a -o /dev/null --why-extract=- | tee stdout
reference extracted symbol
main.o a_b.a(a_b.o) a
a_b.a(a_b.o) b_c.a(b_c.o) b()
b_c.a(b_c.o) c.a(c.o) c()
% ruby -ane 'BEGIN{p={}}; p[$F[1]]=[$F[0],$F[2]] if $.>1; END{x="c.a(c.o)"; while y=p[x]; puts "#{y[0]} extracts #{x} to resolve #{y[1]}"; x=y[0] end}' stdout
b_c.a(b_c.o) extracts c.a(c.o) to resolve c()
a_b.a(a_b.o) extracts b_c.a(b_c.o) to resolve b()
main.o extracts a_b.a(a_b.o) to resolve a
```
Archive member extraction happens before --gc-sections, so this may not be a live path
under --gc-sections, but I think it is a good approximation in practice.
* Specifying a file avoids output interleaving with --verbose.
* Required `=` prevents accidental overwrite of an input if the user forgets `=`. (Most of compiler drivers' long options accept `=` but not ` `)
Differential Revision: https://reviews.llvm.org/D109572
This is available in GNU ld 2.35 and can be seen as a shortcut for multiple
--export-dynamic-symbol, or a --dynamic-list variant without the symbolic intention.
In the long term, this option probably should be preferred over --dynamic-list.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D107317