Commit Graph

317 Commits

Author SHA1 Message Date
Jacek Caban
7c26407a20 [LLD][COFF] Clarify EC vs. native symbols in diagnostics on ARM64X (#130857)
On ARM64X, symbol names alone are ambiguous as they may refer to either
a native or an EC symbol. Append '(EC symbol)' or '(native symbol)' in
diagnostic messages to distinguish them.
2025-03-15 21:15:08 +01:00
Jacek Caban
b09dfbd699 [LLD][COFF] Add support for x86_64 archives on ARM64X (#128241)
If the ECSYMBOLS section is missing in the archive, the archive could be
either a native-only ARM64 or x86_64 archive. Check the machine type of
the object containing a symbol to determine which symbol table to use.
2025-02-22 11:20:58 +01:00
Jacek Caban
fb01a28903 [LLD][COFF] Implement support for hybrid IAT on ARM64X (#124189)
In hybrid images, the PE header references a single IAT for both native
and EC views, merging entries where possible. When merging isn't
feasible, different imports are grouped together, and ARM64X relocations
are emitted as needed.
2025-01-26 22:11:40 +01:00
Nico Weber
d9b8120259 [lld/COFF] Fix -start-lib / -end-lib more after reviews.llvm.org/D116434 (#124294)
This is a follow-up to #120452 in a way.

Since lld/COFF does not yet insert all defined in an obj file before all
undefineds (ELF and MachO do this, see #67445 and things linked from
there), it's possible that:

1. We add an obj file a.obj
2. a.obj contains an undefined that's in b.obj, causing b.obj to be
added
3. b.obj contains an undefined that's in a part of a.obj that's not yet
in the symbol table, causing a recursive load of a.obj, which adds the
symbols in there twice, leading to duplicate symbol errors.

For normal archives, `ArchiveFile::addMember()` has a `seen` check to
prevent this. For start-lib lazy objects, we can just check if the
archive is still lazy at the recursive call.

This bug is similar to issue #59162.

(Eventually, we'll probably want to do what the MachO and ELF ports do.)

Includes a test that caused duplicate symbol diagnostics before this
code change.
2025-01-24 13:14:21 -05:00
Martin Storsjö
8eb99bbe6e Reland [LLD] [COFF] Fix linking MSVC generated implib header objects (#123916)
ecb5ea6a26 tried to fix cases when LLD
links what seems to be import library header objects from MSVC. However,
the fix seems incorrect; the review at https://reviews.llvm.org/D133627
concluded that if this (treating this kind of symbol as a common symbol)
is what link.exe does, it's fine.

However, this is most probably not what link.exe does. The symbol
mentioned in the commit message of
ecb5ea6a26 would be a common symbol with a
size of around 3 GB; this is not what might have been intended.

That commit tried to avoid running into the error ".idata$4 should not
refer to special section 0"; that issue is fixed for a similar style of
section symbols in 4a4a8a1476.

Therefore, revert ecb5ea6a26 and extend
the fix from 4a4a8a1476 to also work for
the section symbols in MSVC generated import libraries.

The main detail about them, is that for symbols of type
IMAGE_SYM_CLASS_SECTION, the Value field is not an offset, but it is an
optional set of flags, corresponding to the Characteristics of the
section header (although it may be empty).

This is a reland of a previous version of this commit, earlier merged in
9457418e66 / #122811. The previous version
failed tests when run with address sanitizer. The issue was that the
synthesized coff_symbol_generic object actually will be used to access a
full coff_symbol16 or coff_symbol32 struct, see
DefinedCOFF::getCOFFSymbol. Therefore, we need to make a copy of the
full size of either of them.
2025-01-23 09:15:47 +02:00
Thurston Dang
c53faf63ff Revert "[LLD] [COFF] Fix linking MSVC generated implib header objects" (#123877)
Reverts llvm/llvm-project#122811 due to buildbot breakage e.g.,
https://lab.llvm.org/buildbot/#/builders/52/builds/5421/steps/11/logs/stdio

ASan output from local re-run:
```
==2780289==ERROR: AddressSanitizer: use-after-poison on address 0x7e0b87e28d28 at pc 0x55a979a99e7e bp 0x7ffe4b18f0b0 sp 0x7ffe4b18f0a8
READ of size 1 at 0x7e0b87e28d28 thread T0
    #0 0x55a979a99e7d in getStorageClass /usr/local/google/home/thurston/buildbot_repro/llvm-project/llvm/include/llvm/Object/COFF.h:344
    #1 0x55a979a99e7d in isSectionDefinition /usr/local/google/home/thurston/buildbot_repro/llvm-project/llvm/include/llvm/Object/COFF.h:429:9
    #2 0x55a979a99e7d in getSymbols /usr/local/google/home/thurston/buildbot_repro/llvm-project/lld/COFF/LLDMapFile.cpp:54:42
    #3 0x55a979a99e7d in lld::coff::writeLLDMapFile(lld::coff::COFFLinkerContext const&) /usr/local/google/home/thurston/buildbot_repro/llvm-project/lld/COFF/LLDMapFile.cpp:103:40
    #4 0x55a979a16879 in (anonymous namespace)::Writer::run() /usr/local/google/home/thurston/buildbot_repro/llvm-project/lld/COFF/Writer.cpp:810:3
    #5 0x55a979a00aac in lld::coff::writeResult(lld::coff::COFFLinkerContext&) /usr/local/google/home/thurston/buildbot_repro/llvm-project/lld/COFF/Writer.cpp:354:15
    #6 0x55a97985f7ed in lld::coff::LinkerDriver::linkerMain(llvm::ArrayRef<char const*>) /usr/local/google/home/thurston/buildbot_repro/llvm-project/lld/COFF/Driver.cpp:2826:3
    #7 0x55a97984cdd3 in lld::coff::link(llvm::ArrayRef<char const*>, llvm::raw_ostream&, llvm::raw_ostream&, bool, bool) /usr/local/google/home/thurston/buildbot_repro/llvm-project/lld/COFF/Driver.cpp:97:15
    #8 0x55a9797f9793 in lld::unsafeLldMain(llvm::ArrayRef<char const*>, llvm::raw_ostream&, llvm::raw_ostream&, llvm::ArrayRef<lld::DriverDef>, bool) /usr/local/google/home/thurston/buildbot_repro/llvm-project/lld/Common/DriverDispatcher.cpp:163:12
    #9 0x55a9797fa3b6 in operator() /usr/local/google/home/thurston/buildbot_repro/llvm-project/lld/Common/DriverDispatcher.cpp:188:15
    #10 0x55a9797fa3b6 in void llvm::function_ref<void ()>::callback_fn<lld::lldMain(llvm::ArrayRef<char const*>, llvm::raw_ostream&, llvm::raw_ostream&, llvm::ArrayRef<lld::DriverDef>)::$_0>(long) /usr/local/google/home/thurston/buildbot_repro/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:46:12
    #11 0x55a97966cb93 in operator() /usr/local/google/home/thurston/buildbot_repro/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:69:12
    #12 0x55a97966cb93 in llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) /usr/local/google/home/thurston/buildbot_repro/llvm-project/llvm/lib/Support/CrashRecoveryContext.cpp:426:3
    #13 0x55a9797f9dc3 in lld::lldMain(llvm::ArrayRef<char const*>, llvm::raw_ostream&, llvm::raw_ostream&, llvm::ArrayRef<lld::DriverDef>) /usr/local/google/home/thurston/buildbot_repro/llvm-project/lld/Common/DriverDispatcher.cpp:187:14
    #14 0x55a979627512 in lld_main(int, char**, llvm::ToolContext const&) /usr/local/google/home/thurston/buildbot_repro/llvm-project/lld/tools/lld/lld.cpp:103:14
    #15 0x55a979628731 in main /usr/local/google/home/thurston/buildbot_repro/llvm_build_asan/tools/lld/tools/lld/lld-driver.cpp:17:10
    #16 0x7ffb8b202c89 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #17 0x7ffb8b202d44 in __libc_start_main csu/../csu/libc-start.c:360:3
    #18 0x55a97953ef60 in _start (/usr/local/google/home/thurston/buildbot_repro/llvm_build_asan/bin/lld+0x8fd1f60)
```
2025-01-21 20:40:07 -08:00
Martin Storsjö
9457418e66 [LLD] [COFF] Fix linking MSVC generated implib header objects (#122811)
ecb5ea6a26 tried to fix cases when LLD
links what seems to be import library header objects from MSVC. However,
the fix seems incorrect; the review at https://reviews.llvm.org/D133627
concluded that if this (treating this kind of symbol as a common symbol)
is what link.exe does, it's fine.

However, this is most probably not what link.exe does. The symbol
mentioned in the commit message of
ecb5ea6a26 would be a common symbol with a
size of around 3 GB; this is not what might have been intended.

That commit tried to avoid running into the error ".idata$4 should not
refer to special section 0"; that issue is fixed for a similar style of
section symbols in 4a4a8a1476.

Therefore, revert ecb5ea6a26 and extend
the fix from 4a4a8a1476 to also work for
the section symbols in MSVC generated import libraries.

The main detail about them, is that for symbols of type
IMAGE_SYM_CLASS_SECTION, the Value field is not an offset, but it is an
optional set of flags, corresponding to the Characteristics of the
section header (although it may be empty).
2025-01-21 23:55:41 +02:00
Jacek Caban
b068f2fd0f [LLD][COFF] Process bitcode files separately for each symbol table on ARM64X (#123194) 2025-01-17 11:36:12 +01:00
Martin Storsjö
4a4a8a1476 [LLD] [COFF] Fix linking import libraries with -wholearchive: (#122806)
When LLD links against an import library (for the regular, short import
libraries), it doesn't actually link in the header/trailer object files
at all, but synthesizes new corresponding data structures into the right
sections.

If the whole of such an import library is forced to be linked, e.g. with
the -wholearchive: option, we actually end up linking in those
header/trailer objects. The header objects contain a construct which LLD
fails to handle; previously we'd error out with the error ".idata$4
should not refer to special section 0".

Within the import library header object, in the import directory we have
relocations towards the IAT (.idata$4 and .idata$5), but the header
object itself doesn't contain any data for those sections.

In the case of GNU generated import libraries, the header objects
contain zero length sections .idata$4 and .idata$5, with relocations
against them. However in the case of LLVM generated import libraries,
the sections .idata$4 and .idata$5 are not included in the list of
sections. The symbol table does contain section symbols for these
sections, but without any actual associated section. This can probably
be seen as a declaration of an empty section.

If the header/trailer objects of a short import library are linked
forcibly and we also reference other functions in the library, we end up
with two import directory entries for this DLL, one that gets
synthesized by LLD, and one from the actual header object file. This is
inelegant, but should be acceptable.

While it would seem unusual to link import libraries with the
-wholearchive: option, this can happen in certain scenarios.

Rust builds libraries that contain relevant import libraries bundled
along with compiled Rust code as regular object files, all within one
single archive. Such an archive can then end up linked with the
-wholarchive: option, if build systems decide to use such an option for
including static libraries.

This should fix https://github.com/msys2/MINGW-packages/issues/21017.

This works for the header/trailer object files in import libraries
generated by LLVM; import libraries generated by MSVC are vaguely
different. ecb5ea6a26 did an attempt at
fixing the issue for MSVC generated libraries, but it's not entirely
correct, and isn't enough for making things work for that case.
2025-01-16 00:09:09 +02:00
Jacek Caban
616007d88f [LLD][COFF] Skip sections marked as IMAGE_SCN_LNK_INFO in the output image (#122752)
Fixes #106275.
2025-01-14 17:28:59 +01:00
Nico Weber
6cd171dc33 [lld/COFF] Support thin archives in /reproduce: files (#121512)
This already worked without /wholearchive; now it works with it too.
(Only for thin archives containing relative file names, matching the ELF
and Mach-O ports.)
2025-01-03 08:20:06 -05:00
Jacek Caban
8435225374 [LLD][COFF] Move addFile implementation to LinkerDriver (NFC) (#121342)
The addFile implementation does not rely on the SymbolTable object. With
#119294, the symbol table for input files is determined during the
construction of the objects representing them. To clarify that
relationship, this change moves the implementation from the SymbolTable
class to the LinkerDriver class.
2025-01-01 19:42:49 +01:00
Nico Weber
2b6713d3b8 [lld/coff] Fix assert on /start-lib foo.obj /end-lib during eager loads (#120292)
If foo.obj is eagerly loaded (due to a prior undef referencing one if
its symbols) and has more than one symbol, we used to assert:
SymbolTable::addLazyObject() for the first symbol would set `lazy` to
false and load all symbols from the file, but the outer
ObjFile::parseLazy() loop would continue to run and call addLazyObject()
for the second symbol, which would assert.

Instead, just stop adding lazy symbols if the file got loaded for real
while adding a symbol.

(The ELF port has a similar early exit in `ObjFile<ELFT>::parseLazy()`.)
2024-12-19 11:22:29 -05:00
Jacek Caban
16ef239520 [LLD][COFF] Introduce hybrid symbol table for EC input files on ARM64X (#119294) 2024-12-17 21:19:01 +01:00
Jacek Caban
9c8214ff31 [LLD][COFF] Create COFFObjectFile instance when constructing ObjFile (NFC) (#120144)
This change moves the creation of COFFObjectFile to the construction of
ObjFile, instead of delaying it until parsing.
2024-12-17 19:26:13 +01:00
Jacek Caban
7168de5ca7 Revert "[LLD][COFF] Introduce hybrid symbol table for EC input files on ARM64X (#119294)"
This reverts commit a8206e7b37 due to sanitizer failures.
2024-12-15 22:31:28 +01:00
Jacek Caban
a8206e7b37 [LLD][COFF] Introduce hybrid symbol table for EC input files on ARM64X (#119294)
On hybrid ARM64X targets, ARM64 and ARM64EC input files operate in
separate namespaces and cannot reference each other. This change
introduces separate `SymbolTable` instances and associates each
`InputFile` with the appropriate table to reflect this behavior.
2024-12-15 18:49:32 +01:00
Jacek Caban
d3c4857179 [LLD][COFF] Store machine type in SymbolTable (NFC) (#119298)
This change prepares for hybrid ARM64X support, which requires two
`SymbolTable` instances: one for native symbols and one for EC symbols.
In such cases, `config.machine` will remain ARM64X, while the
`SymbolTable` instances will store ARM64 and ARM64EC machine types.
2024-12-15 18:43:09 +01:00
Jacek Caban
6b493baec1 [LLD][COFF] Store reference to SymbolTable instead of COFFLinkerContext in InputFile (NFC) (#119296)
This change prepares for the introduction of separate hybrid namespaces.
Hybrid images will require two `SymbolTable` instances, making it
necessary to associate `InputFile` objects with the relevant one.
2024-12-15 12:45:34 +01:00
Fangrui Song
c7caab2238 [lld-link] Simplify some << toString 2024-12-05 20:56:19 -08:00
Fangrui Song
8b844de3c9 [lld-link] Replace fatal(...) with Fatal 2024-12-05 20:18:01 -08:00
Fangrui Song
8d225f10ef [lld-link] Replace error(...) with Err 2024-12-05 19:44:26 -08:00
Fangrui Song
4639a9a063 [lld-link] Replace log(...) with Log 2024-12-04 09:04:40 -08:00
Fangrui Song
1534f45694 [lld-link] Replace warn(...) with Warn(ctx) 2024-12-03 22:19:30 -08:00
Fangrui Song
982575fd06 [lld-link] Add context-aware diagnostic functions (#118430)
Similar to #112319 for ELF. While there is some initial boilerplate, it
can simplify some call sites that use Twine, especially when a printed
element uses `ctx` or toString.
2024-12-03 20:51:50 -08:00
Jacek Caban
581106759a [LLD][COFF] Support ARM64EC in BitcodeFile::getMachineType (#115474) 2024-11-09 13:21:58 +01:00
Jacek Caban
5d7afd324a [LLD][COFF] Add EC alias symbols for undefined x86_64 symbols on ARM64EC target (#114466) 2024-11-04 16:26:33 +01:00
Jacek Caban
9b88792291 [LLD][COFF] Allow overriding EC alias symbols with lazy archive symbols (#113283)
On ARM64EC, external function calls emit a pair of weak-dependency
aliases: `func` to `#func` and `#func` to the `func` guess exit thunk
(instead of a single undefined `func` symbol, which would be emitted on
other targets). Allow such aliases to be overridden by lazy archive
symbols, just as we would for undefined symbols.
2024-10-23 12:43:38 +02:00
Jacek Caban
f1ba8943c8 [LLD][COFF] Support anti-dependency symbols (#112542)
Co-authored-by: Billy Laws <blaws05@gmail.com>

Anti-dependency symbols are allowed to be duplicated, with the first
definition taking precedence. If a regular weak alias is present, it is
preferred over an anti-dependency definition. Chaining anti-dependencies
is not allowed.
2024-10-21 11:44:31 +02:00
Jacek Caban
486f790d29 [LLD][COFF] Process all ARM64EC import symbols in MapFile's getSymbols (#109118) 2024-09-19 13:47:22 +02:00
Jacek Caban
a17a2451db [LLD][COFF] Add Support for auxiliary IAT copy (#108610)
In addition to the auxiliary IAT, ARM64EC modules also contain a copy of
it. At runtime, the auxiliary IAT is filled with the addresses of actual
ARM64EC functions when possible. If patching is detected, the OS may use
the IAT copy to revert the auxiliary IAT, ensuring that the call checker
is used for calls to imported functions.
2024-09-17 14:40:24 +02:00
Jacek Caban
ea5d37f4c1 [LLD][COFF] Add Support for ARM64EC Import Thunks (#108460)
ARM64EC import thunks function similarly to regular ARM64 thunks but use
a mangled name and perform the call through the auxiliary IAT.
2024-09-13 17:05:02 +02:00
Jacek Caban
6be9be5e0b [LLD][COFF][NFC] Store live flag in ImportThunkChunk. (#108459)
Instead of ImportFile. This is a preparation for ARM64EC support, which
has both x86 and ARM64EC thunks and each of them needs a separate flag.
2024-09-13 15:42:05 +02:00
Jacek Caban
82a36468c7 [LLD][COFF] Add support for ARM64EC auxiliary IAT (#108304)
In addition to the regular IAT, ARM64EC also includes an auxiliary IAT.
At runtime, the regular IAT is populated with the addresses of imported
functions, which may be x86_64 functions or the export thunks of ARM64EC
functions. The auxiliary IAT contains versions of functions that are
guaranteed to be directly callable by ARM64 code.

The linker fills the auxiliary IAT with the addresses of `__impchk_`
thunks. These thunks perform a call on the IAT address using
`__icall_helper_arm64ec` with the target address from the IAT. If the
imported function is an ARM64EC function, the OS may replace the address
in the auxiliary IAT with the address of the ARM64EC version of the
function (not its export thunk), avoiding the runtime call checker for
better performance.
2024-09-12 22:20:50 +02:00
Jacek Caban
99a2354993 [LLD][COFF] Add support for ARM64EC import call thunks. (#107931)
These thunks can be accessed using `__impchk_*` symbols, though they
are typically not called directly. Instead, they are used to populate the
auxiliary IAT. When the imported function is x86_64 (or an ARM64EC
function with a patched export thunk), the thunk is used to call it.
Otherwise, the OS may replace the thunk at runtime with a direct
pointer to the ARM64EC function to avoid the overhead.
2024-09-11 14:46:40 +02:00
Jacek Caban
7e0008d5ad [LLD][COFF][NFC] Create import thunks in ImportFile::parse. (#107929) 2024-09-11 12:22:36 +02:00
Jacek Caban
3d53212f61 [LLD][COFF] Initial support for ARM64EC importlibs. (#107164)
Use demangled symbol name for __imp_ symbols and define demangled thunk
symbol as AMD64 thunk.
2024-09-04 15:03:36 +02:00
Jacek Caban
519b36925c [LLD][COFF][NFC] Store impSym as DefinedImportData in ImportFile. (#107162) 2024-09-04 11:49:50 +02:00
Jacek Caban
ecc9aece72 [LLD][COFF] Use archive's ECSYMBOLS on ARM64EC target when available. (#106904) 2024-09-02 23:14:55 +02:00
Jacek Caban
e1cf849e82 [LLD][COFF] Use parentName for import files in toString. (#106104)
Improves diagnostic messages.
2024-08-26 22:08:33 +02:00
Jacek Caban
846dccce9c [LLD][COFF] Validate import library machine type. (#102738) 2024-08-11 19:03:09 +02:00
Jacek Caban
2849ebb19c [LLD][NFC] Make InputFile::getMachineType const. (#102737) 2024-08-10 15:03:23 +02:00
Jacek Caban
fed8e38c19 [LLD][COFF] Add support for ARM64EC entry thunks. (#88132)
For x86_64 callable functions, ARM64EC requires an entry thunk generated
by the compiler. The linker interprets .hybmp sections to associate
function chunks with their entry points and writes an offset to thunks
preceding function section contents.

Additionally, ICF needs to be aware of entry thunks to not consider
chunks to be equal when they have different entry thunks, and GC needs
to mark entry thunks together with function chunks.

I used a new SectionChunkEC class instead of storing entry thunks in
SectionChunk, following the guideline to keep SectionChunk as compact as
possible. This way, there is no memory usage increase on non-EC targets.
2024-06-18 11:14:01 +02:00
GkvJwa
c11677eedb [LLD][COFF] Support finding pdb files from outputpath (#94153)
In addition to looking for dependent (input) PDB files next to the associated .OBJ file, we now also look into the output folder as well. This mimics MSVC link.exe behavior.

Fixes #94152
2024-06-17 11:20:06 -04:00
Jacek Caban
7b275aa243 [LLD][COFF] Add support for IMPORT_NAME_EXPORTAS import library names. (#83211)
This allows handling importlibs produced by llvm-dlltool in #78772.
ARM64EC import libraries use it by default, but it's supported by MSVC
link.exe on other platforms too.

This also avoids assuming null-terminated input, like in #78769.
2024-03-11 00:13:04 +01:00
Kazu Hirata
21730eb49b [lld] Use SmallString::operator std::string (NFC) 2024-01-22 00:13:23 -08:00
Martin Storsjö
d0986519d5 [LLD] [COFF] Preserve directives and export names from LTO objects (#78802)
The export names are saved as StringRefs pointing into the COFF
directives. In the case of LTO objects, this can be memory allocated
that is owned by the LTO InputFile, which gets destructed when doing the
compilation.

In the case of LTO objects from an older version of LLVM, which require
being upgraded when loaded, the directives string gets destructed, while
when using LTO objects of a matching version (the common case), the
directives string points into memory that doesn't get destructed on LTO
compilation.

Test this by linking a bundled binary LTO object file, from an older
version of LLVM.

This fixes issue #78591, and downstream issue
https://github.com/mstorsjo/llvm-mingw/issues/392.
2024-01-20 16:15:44 +02:00
Martin Storsjö
23e6e88187 [LLD] [COFF] Rewrite the config flags for dwarf debug info or symtab. NFC. (#75172)
This shouldn't have any user visible effect, but makes the logic within
the linker implementation more explicit.

Note how DWARF debug info sections were retained even if enabling a link
with PDB info only; that behaviour is preserved.
2023-12-15 20:01:13 +02:00
Martin Storsjö
143133fe68 [LLD] [COFF] Don't preserve unnecessary __imp_ prefixed symbols (#72989)
This redoes the fix from 3ab6209a3f
differently, without the unwanted effect of preserving unnecessary
`__imp_` prefixed symbols.

If the referencing object is a regular object, the `__imp_` symbol will
have `isUsedInRegularObj` set on it from that already. If the
referencing object is an LTO object, we set `isUsedInRegularObj` for any
symbol starting with `__imp_`.

If the object file defining the `__imp_` symbol is a regular object, the
`isUsedInRegularObj` flag has no effect. If it is an LTO object, it
causes the symbol to be preserved.
2023-12-04 23:38:46 +02:00
Martin Storsjö
e8961969ec [LLD] [COFF] Fix deducing the machine type from LTO objects for ARM/Thumb (#71335)
In practice, all the Windows ARMNT IR objects show the architecture type
Thumb, not ARM.

Most other switch cases for architecture in lld/COFF check for and treat
`arm` and `thumb` equally.
2023-11-07 12:00:31 +02:00