--remap-inputs-file= can be specified multiple times, each naming a
remap file that contains `from-glob=to-file` lines or `#`-led comments.
('=' is used a separator a la -fdebug-prefix-map=)
--remap-inputs-file= can be used to:
* replace an input file. E.g. `"*/libz.so=exp/libz.so"` can replace a resolved
`-lz` without updating the input file list or (if used) a response file.
When debugging an application where a bug is isolated to one single
input file, this option gives a convenient way to test fixes.
* remove an input file with `/dev/null` (changed to `NUL` on Windows), e.g.
`"a.o=/dev/null"`. A build system may add unneeded dependencies.
This option gives a convenient way to test the result removing some inputs.
`--remap-inputs=a.o=aa.o` can be specified to provide one pattern without using
an extra file.
(bash/zsh process substitution is handy for specifying a pattern without using
a remap file, e.g. `--remap-inputs-file=<(printf 'a.o=aa.o')`, but it may be
unavailable in some systems. An extra file can be inconvenient for a build
system.)
Exact patterns are tested before wildcard patterns. In case of a tie, the first
patterns wins. This is an implementation detail that users should not rely on.
Co-authored-by: Marco Elver <elver@google.com>
Link: https://discourse.llvm.org/t/rfc-support-exclude-inputs/70070
Reviewed By: melver, peter.smith
Differential Revision: https://reviews.llvm.org/D148859
When --threads= is unspecified, we set it to
`parallel::strategy.compute_thread_count()`, which uses
sched_getaffinity (Linux)/cpuset_getaffinity (FreeBSD)/std::thread::hardware_concurrency (others).
With extensive testing on many machines (many configurations from
{aarch64,x86-64} x {Linux,FreeBSD,Windows} x allocators(native,mimalloc,rpmalloc) combinations)
with varying workloads, we discovered that when the concurrency is larger than
16, the linking process is slower than using --threads=16 due to parallelism
overhead outweighs optimizations. This is particularly harmful for machines with
many cores or when the link job competes with other jobs.
Cap parallel::strategy when --threads= is unspecified.
For some workloads changing the concurrency from 8 to 16 has nearly no improvement.
--thinlto-jobs= is unchanged since ThinLTO backend compiles are embarrassingly
parallel.
Link: https://discourse.llvm.org/t/avoidable-overhead-from-threading-by-default/69160
Reviewed By: peter.smith, andrewng
Differential Revision: https://reviews.llvm.org/D147493
This reverts commit da68d2164e.
This change is correct, but left a `config->threadCount` use that is error-prone
and may harm performance when parallel::strategy.compute_thread_count() > 16.
This implements support for relaxing these relocations to use the GP
register to compute addresses of globals in the .sdata and .sbss
sections.
This feature is off by default and must be enabled by passing
--relax-gp to the linker.
The GP register might not always be the "global pointer". It can
be used for other purposes. See discussion here
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/371
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D143673
When --threads= is unspecified, we set it to
`parallel::strategy.compute_thread_count()`, which uses
sched_getaffinity (Linux)/cpuset_getaffinity (FreeBSD)/std::thread::hardware_concurrency (others).
With extensive testing on many machines (many configurations from
{aarch64,x86-64} x {Linux,FreeBSD,Windows} x allocators(native,mimalloc,rpmalloc) combinations)
with varying workloads, we discovered that when the concurrency is larger than
16, the linking process is slower than using --threads=16 due to parallelism
overhead outweighs optimizations. This is particularly harmful for machines with
many cores or when the link job competes with other jobs.
Cap parallel::strategy when --threads= is unspecified.
For some workloads changing the concurrency from 8 to 16 has nearly no improvement.
--thinlto-jobs= is unchanged since ThinLTO backend compiles are embarrassingly
parallel.
Link: https://discourse.llvm.org/t/avoidable-overhead-from-threading-by-default/69160
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D147493
Currently, the --thinlto-prefix-replace="oldpath;newpath" option is used during
distributed ThinLTO thin links to specify the mapping of the input bitcode object
files' directory tree (oldpath) to the directory tree (newpath) used for both:
1) the output files of the thin link itself (the .thinlto.bc index files and the
optional .imports files)
2) the specified object file paths written to the response file given in the
--thinlto-index-only=${response} option, which is used by the final native
link and must match the paths of the native object files that will be
produced by ThinLTO backend compiles.
This patch expands the --thinlto-prefix-replace option to allow a separate directory
tree mapping to be specified for the object file paths written to the response file
(number 2 above). This is important to support builds and build systems where the
same output directory may not be written by multiple build actions (e.g. the thin link
and the ThinLTO backend compiles).
The new format is: --thinlto-prefix-replace="origpath;outpath[;objpath]"
This replaces the origpath directory tree of the thin link input files with
outpath when writing the thin link index and imports outputs (number 1
above). If objpath is specified it replaces origpath of the input files with
objpath when writing the response file (number 2 above), otherwise it
falls back to the old behavior of using outpath for this as well.
Reviewed By: tejohnson, MaskRay
Differential Revision: https://reviews.llvm.org/D144596
Changes:
- Adding BE32 big endian Support for Arm.
- Replace the writele and readle with their endian-aware versions.
- Adding test cases for the big-endian be32 arm configuration.
Patch by: Milosz Plichta. This patch merges all the changes from
this patch https://reviews.llvm.org/D140203 as well.
Reviewed By: peter.smith, MaskRay
Differential Revision: https://reviews.llvm.org/D140202
Adds the new AArch64-ABI dynamic entry generation to LLD. This will
allow Android to move from the Android-specific ELF note onto the
dynamic entries.
Change the behaviour of an unspecified --android-memtag-mode. Now, when
unspecified, this will print a warning that you're doing a no-op, rather
than implicitly turning on sync mode. This is important for MTE globals
later, where a binary containing static tagged global descriptors
shouldn't have MTE turned on without specific intent being passed to the
linker.
For now, continue to emit the Android ELF note by default. In future, we
can probably make it only emit the note when provided a flag.
Do a quick NFC-cleanup of the ELF note while we're here. It doesn't
change anything about the ELF note itself, but makes it more clear to
the reader of the code what alignment requirements are being (previously
implicitly) met.
Reviewed By: fmayer, MaskRay
Differential Revision: https://reviews.llvm.org/D143769
Allow controlling the CodeGenOpt::Level independent of the LTO
optimization level in LLD via new options for the COFF, ELF, MachO, and
wasm frontends to lld. Most are spelled as --lto-CGO[0-3], but COFF is
spelled as -opt:lldltocgo=[0-3].
See D57422 for discussion surrounding the issue of how to set the CG opt
level. The ultimate goal is to let each function control its CG opt
level, but until then the current default means it is impossible to
specify a CG opt level lower than 2 while using LTO. This option gives
the user a means to control it for as long as it is not handled on a
per-function basis.
Reviewed By: MaskRay, #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D141970
gcc warned like
../../lld/ELF/InputSection.cpp:75:37: warning: ISO C++11 requires at least one argument for the "..." in a variadic macro
75 | invokeELFT(parseCompressedHeader);
| ^
changes:
- BLX: The Arm architecture versions that support the branch and link
instruction (BLX), can rewrite BLs in place when a state change from Arm<->Thumb
is required. Armv4T does not have BLX and so needs thunks for state changes.
- v4T Thumb long branches needed their own thunk. We could have used the v6M
implementation, but v6M doesn't have Arm state and must resolve to rather
inefficient stack reshuffling. We also can't reuse v7 thumb thunks as they use
MOVV/MOVT, which wasn't available yet for v4T.
- Remove the `lack of BLX' warning. LLVM only supports Arm Architecture versions
upwards of v4, which we now all support in LLD.
- renamed existing thunks to better reflect their use:
ARMV5ABSLongThunk -> ARMV5LongLdrPcThunk,
ARMV5PILongThunk -> ARMV4PILongThunk
- removed isCompatibleWith method from ARMV5ABSLongThunk and ARMV5PILongThunk,
as they were identical to the ARMThunk parent class implementation.
Support for (efficient) position independent thunks for v4T will be added in a
follow-up patch, including possible related thunk renaming and code comment
cleanup.
Reviewed By: MaskRay, peter.smith
Differential Revision: https://reviews.llvm.org/D139888
Currently we take the first SHT_RISCV_ATTRIBUTES (.riscv.attributes) as the
output. If we link an object without an extension with an object with the
extension, the output Tag_RISCV_arch may not contain the extension and some
tools like objdump -d will not decode the related instructions.
This patch implements
Tag_RISCV_stack_align/Tag_RISCV_arch/Tag_RISCV_unaligned_access merge as
specified by
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-elf.adoc#attributes
For the deprecated Tag_RISCV_priv_spec{,_minor,_revision}, dump the attribute to
the output iff all input agree on the value. This is different from GNU ld but
our simple approach should be ok for deprecated tags.
`RISCVAttributeParser::handler` currently warns about unknown tags. This
behavior is retained. In GNU ld arm, tags >= 64 (mod 128) are ignored with a
warning. If RISC-V ever wants to do something similar
(https://github.com/riscv-non-isa/riscv-elf-psabi-doc/issues/352), consider
documenting it in the psABI and changing RISCVAttributeParser.
Like GNU ld, zero value integer attributes and empty string attributes are not
dumped to the output.
Reviewed By: asb, kito-cheng
Differential Revision: https://reviews.llvm.org/D138550
Allowing incorrect version scripts is not a helpful default. Flip that
to help users find their bugs at build time rather than at run time.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D135402
When -fgpu-rdc is used for linking relocatable objects, clang driver launches
clang-offload-bundler to extract a device relocatable object from each input
relocatable object file and passes the extracted files to lld. The input relocatable
object file could either come from HIP program or C++ program. The relocatable
object file from C++ program does not contain device relocatable objects, therefore
clang-offload-bundler extracts an empty file and passes it to lld. lld treates
empty file as linker script. When there is no object input file to lld, lld
will emit error:
target emulation unknown: -m or at least one .o file required
This patch adds "elf64_amdgpu" to lld so that lld always know the target
no matter whether there are object input files or not.
Reviewed by: Artem Belevich, Fangrui Song
Differential Revision: https://reviews.llvm.org/D138221
Follows up on commit cd5d5ce235 by
additionally ignoring relative paths ending in "lto-wrapper.exe" as
can be the case for GCC cross-compiled for Windows.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D138065
Like D134013, but for COFF and Mach-O.
Also expand the ELF test a bit. I at first didn't realize that `getValue()` for
`-mllvm -foo=bar` would return `-foo=bar` instead of just `bar`, and so
I wrote the test to check if we indeed get this wrong. We don't, but
having the test for it seems nice, so I'm including it.
Differential Revision: https://reviews.llvm.org/D137971
Allowing incorrect version scripts is not a helpful default. Flip that
to help users find their bugs at build time rather than at run time.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D135402
Mach-O ld64 supports -w to suppress warnings. GNU ld 2.40 will support the
option as well (https://sourceware.org/bugzilla/show_bug.cgi?id=29654).
This feature has some small value. E.g. when analyzing a large executable with
relocation overflow issues, we may use --noinhibit-exec --emit-relocs to get an
output file with static relocations despite relocation overflow issues. -w can
significantly improve the link time as printing the massive warnings is slow.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D136569
A reference to __real_foo should trigger archive extraction of the input file that defines foo, otherwise a link using --wrap=foo might fail to link with an undefined reference to foo.
This matches bfd linker behaviour.
Differential Revision: https://reviews.llvm.org/D135897
This reverts commit 096f93e73d.
Revert "[Libomptarget] Make the plugins ingore undefined exported symbols"
This reverts commit 3f62314c23.
Revert "[LLD] Enable --no-undefined-version by default."
This reverts commit 7ec8b0d162.
Three commits are reverted because of the current omp build fail
with GNU ld. See discussion here: https://reviews.llvm.org/rG096f93e73dc3
Allowing incorrect version scripts is not a helpful default. Flip that
to help users find their bugs at build time rather than at run time.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D135402
Add LLVM_LIBRARY_VISIBILITY to remove unneeded GOT and unique_ptr
indirection. We can move other global variables into ctx without
indirection concern. In the long term we may consider passing Ctx
as a parameter to various functions and eliminate global state as
much as possible and then remove `Ctx::reset`.
`config` has 1000+ uses so we try to avoid changing `config->foo`. Define a
wrapper with LLVM_LIBRARY_VISIBILITY to remove unneeded GOT and unique_ptr
indirection.
My x86-64 lld executable is 11+KiB smaller.
Symbol::replace intends to overwrite a few fields (mostly Elf{32,64}_Sym
fields), but the implementation copies all fields then restores some old fields.
This is error-prone and wasteful. Add Symbol::overwrite to copy just the
needed fields and add other overwrite member functions to copy the extra
fields.
They may modify thinlto optimization.
This patch only extends support for `-mllvm`. There is another way to
pass llvm flags, `-plugin-opt`, but its processing is different and will
be provided in a subsequent patch.
Differential Revision: https://reviews.llvm.org/D134013
`clang -gz=zstd a.o` passes this option to the linker. This option compresses output
debug sections with zstd and sets ch_type to ELFCOMPRESS_ZSTD. As of today, very
few DWARF consumers recognize ELFCOMPRESS_ZSTD.
Use the llvm::zstd::compress API with level llvm::zstd::DefaultCompression (5),
which we may tune after we have more experience with zstd output.
zstd has built-in parallel compression support (so we don't need to do D117853
for zlib), which is not leveraged yet.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D133548
This change renames this method match its original name and the name
used in the wasm linker.
Back in d8f8abbd4a the ELF SymbolTable
method `getSymbols()` was replaced with `forEachSymbol`.
Then in a2fc964417 `forEachSymbol` was
replaced with a `llvm::iterator_range`.
Then in e9262edf0d we came full circle
and the `llvm::iterator_range` was replaced with a `symbols()` accessor
that was identical the original `getSymbols()`.
`getSymbols` also matches the name used elsewhere in the ELF linker as
well as in both COFF and wasm backend (e.g. `InputFiles.h` and
`SyntheticSections.h`)
Differential Revision: https://reviews.llvm.org/D130787
This was recently introduced in GNU linkers and it makes sense for
ld.lld to have the same support. This implementation omits checking if
the input string is valid json to reduce size bloat.
Differential Revision: https://reviews.llvm.org/D131439
This implements the last step of
https://discourse.llvm.org/t/parallel-input-file-parsing/60164 for the ELF port.
For an ELF object file, we previously did: parse, (parallel) initializeLocalSymbols, (parallel) postParseObjectFile.
Now we do: parse, (parallel) initSectionsAndLocalSyms, (parallel) postParseObjectFile.
initSectionsAndLocalSyms does most of input section initialization.
The sequential `parse` does SHT_ARM_ATTRIBUTES/SHT_RISCV_ATTRIBUTES/SHT_GROUP initialization for now.
Performance linking some programs with --threads=8 (glibc 2.33 malloc and mimalloc):
* clang: 1.05x as fast with glibc malloc, 1.03x as fast with mimalloc
* chrome: 1.04x as fast with glibc malloc, 1.03x as fast with mimalloc
* internal search program: 1.08x as fast with glibc malloc, 1.05x as fast with mimalloc
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D130810
I went over the output of the following mess of a command:
`(ulimit -m 2000000; ulimit -v 2000000; git ls-files -z | parallel --xargs -0 cat | aspell list --mode=none --ignore-case | grep -E '^[A-Za-z][a-z]*$' | sort | uniq -c | sort -n | grep -vE '.{25}' | aspell pipe -W3 | grep : | cut -d' ' -f2 | less)`
and proceeded to spend a few days looking at it to find probable typos
and fixed a few hundred of them in all of the llvm project (note, the
ones I found are not anywhere near all of them, but it seems like a
good start).
Differential Revision: https://reviews.llvm.org/D130982
inputSections temporarily contains EhInputSection objects mainly for
combineEhSections. Place EhInputSection objects into a new vector
ehInputSections instead of inputSections.