Commit Graph

1053 Commits

Author SHA1 Message Date
Alexandre Ganea
6f2e92c10c Re-land [LLD] Allow usage of LLD as a library
This reverts commit aa495214b3.

As discussed in https://github.com/llvm/llvm-project/issues/53475 this patch
allows for using LLD-as-a-lib. It also lets clients link only the drivers that
they want (see unit tests).

This also adds the unit test infra as in the other LLVM projects. Among the
test coverage, I've added the original issue from @krzysz00, see:
https://github.com/ROCmSoftwarePlatform/D108850-lld-bug-reproduction

Important note: this doesn't allow (yet) linking in parallel. This will come a
bit later hopefully, in subsequent patches, for COFF at least.

Differential revision: https://reviews.llvm.org/D119049
2023-06-19 07:35:11 -04:00
Leonard Chan
aa495214b3 Revert "[LLD] Allow usage of LLD as a library"
This reverts commit 2700da5fe2.

Reverting since this causes some test failures on our builders: https://ci.chromium.org/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8778372807208184913/overview
2023-06-14 20:36:27 +00:00
Alexandre Ganea
2700da5fe2 [LLD] Allow usage of LLD as a library
As discussed in https://github.com/llvm/llvm-project/issues/53475 this patch allows using LLD-as-a-lib. It also lets clients link only the drivers that they want (see unit tests).

This also adds the unit test infra as in the other LLVM projects. Among the test coverage, I've added the original issue from @krzysz00, see: https://github.com/ROCmSoftwarePlatform/D108850-lld-bug-reproduction

Important note: this doesn't allow (yet) linking in parallel. This will come a bit later, in subsequent patches, for COFF at last.

Differential revision: https://reviews.llvm.org/D119049
2023-06-13 16:22:59 -04:00
Scott Linder
45ee0a9afc [LLD] Add --lto-CGO[0-3] option
Allow controlling the CodeGenOpt::Level independent of the LTO
optimization level in LLD via new options for the COFF, ELF, MachO, and
wasm frontends to lld. Most are spelled as --lto-CGO[0-3], but COFF is
spelled as -opt:lldltocgo=[0-3].

See D57422 for discussion surrounding the issue of how to set the CG opt
level. The ultimate goal is to let each function control its CG opt
level, but until then the current default means it is impossible to
specify a CG opt level lower than 2 while using LTO. This option gives
the user a means to control it for as long as it is not handled on a
per-function basis.

Reviewed By: MaskRay, #lld-macho, int3

Differential Revision: https://reviews.llvm.org/D141970
2023-02-15 17:34:35 +00:00
Fangrui Song
6b9a80de49 [lld] Fix iwyu problems after 83d59e05b2
The commit transitively includes lld/include/lld/Common/ErrorHandler.h into
lld/include/lld/Common/Driver.h, which is not intended.
2022-12-28 10:46:45 -08:00
Fangrui Song
a996cc217c Remove unused #include "llvm/ADT/Optional.h" 2022-12-05 06:31:11 +00:00
Fangrui Song
bac974278c CodeGen/CommandFlags: Convert Optional to std::optional 2022-12-03 18:38:12 +00:00
Krzysztof Parzyszek
8c7c20f033 Convert Optional<CodeModel> to std::optional<CodeModel> 2022-12-03 12:08:47 -06:00
Fangrui Song
c33511c8df [lld] Change Optional to std::optional
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-11-27 17:25:34 -08:00
Jez Ng
32647c8f53 [lld][nfc] Remove lld::demangle() (partial revert of D116279)
{D116279}, in addition to adding support for other demanglers, also
factored out some of the demangling logic. However, I don't think the
abstraction really carries its weight -- after {D135942}, only the ELF
and WASM backends call it with anything other than a non-constant
`shouldDemangle` argument. The COFF and Mach-O backends were already
doing the should-demangle check before calling `demangle()`.

Reviewed By: MaskRay, #lld-macho

Differential Revision: https://reviews.llvm.org/D135943
2022-10-14 15:28:47 -04:00
Fangrui Song
f6bd0a8f2b [ELF] Add makeThreadLocal/makeThreadLocalN and remove InputFile::localSymStorage
makeThreadLocal/makeThreadLocalN are moved from D130810 ([ELF] Parallelize input
section initialization) here to make D130810 more focused on the refactor:

* COFF has some needs for multiple linker contexts. D108850 partially removed
  global states from lldCommon but left the global variable `lctx`.
* To the best of my knowledge, all multiple-linker-context feature requests to
  ELF are more from user convenience, with no very strong argument.
* In practice, ELF port is very difficult to remove global states without
  introducing significant performance regression/hurting code readability.
* Per-thread allocators from D122922/D123879 are too expensive and will not
  really benefit ELF.

This patch adds a simple thread_local based makeThreadLocal to
lld/Common/Memory.h. It will enable further optimization in ELF.
2022-08-04 11:09:40 -07:00
Fangrui Song
4b2b68d5ab [lld] Change vector to SmallVector. NFC
My lld executable is 1.6KiB smaller and some functions are now more efficient.
2022-07-30 18:11:21 -07:00
Daniel Bertalan
0836fc395f [NFC][lld] Fix typos to test commit access 2022-06-24 00:19:18 +02:00
Nico Weber
7cb49996f7 [lld] Remove lld/include/lld/Core
This is all dead code that we forgot to delete in
https://reviews.llvm.org/D114842

Differential Revision: https://reviews.llvm.org/D128147
2022-06-19 21:37:13 -04:00
Keith Smiley
7d57c69826 [lld-macho] Add support for -w
This flag suppresses warnings produced by the linker. In ld64 this has
an interesting interaction with -fatal_warnings, it silences the
warnings but the link still fails. Instead of doing that here we still
print the warning and eagerly fail the link in case both are passed,
this seems more reasonable so users can understand why the link fails.

Differential Revision: https://reviews.llvm.org/D127564
2022-06-11 17:38:50 -07:00
Fangrui Song
941f06282a [lld] Make error handling functions opaque
The inline `lld::error` expands to two function calls `errorHandler` and `error`
where the latter is opaque. Move the functions to .cpp files to decrease code
size.

My x86-64 lld executable is 9KiB smaller.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D120002
2022-02-17 11:54:57 -08:00
Jez Ng
69297cf639 [lld-macho] Don't include CommandFlags.h in CommonLinkerContext.h
Main motivation: including `llvm/CodeGen/CommandFlags.h` in
`CommonLinkerContext.h` means that the declaration of `llvm::Reloc` is
visible in any file that includes `CommonLinkerContext.h`. Since our
cpp files have both `using namespace llvm` and `using namespace
lld::macho`, this results in conflicts with `lld::macho::Reloc`.

I suppose we could put `llvm::Reloc` into a nested namespace, but in general,
I think we should avoid transitively including too many header files in
a very widely used header like `CommonLinkerContext.h`.

RegisterCodeGenFlags' ctor initializes a bunch of function-`static`
structures and does nothing else, so it should be fine to "initialize"
it as a temporary stack variable rather than as a file static.

Reviewed By: aganea

Differential Revision: https://reviews.llvm.org/D119913
2022-02-16 20:05:07 -05:00
Krzysztof Drewniak
1ce314ce6b [MLIR][GPU][lld] Use LLD bundled in ROCm, removing workaround
Having clarified that executing the SerializeToHsaco pass can
depend on a ROCm installation, switch from calling lld as a library to
using the copy of lld guaranteed to be included in a ROCm install.

This removes the workaround introduced in D119277

Reviewed By: whchung

Differential Revision: https://reviews.llvm.org/D119463
2022-02-10 19:37:30 +00:00
Alexandre Ganea
1e661e583d [MLIR] Temporary workaround for calling the LLD ELF driver as-a-lib
This fixes the situation described in https://github.com/llvm/llvm-project/issues/53475 with a repro exposed by https://github.com/ROCmSoftwarePlatform/D108850-lld-bug-reproduction

This is purposely just a workaround to unblock users. This could be transplanted to the release/14.x branch if need be. A proper fix will later be provided in https://reviews.llvm.org/D119049.

Differential Revision: https://reviews.llvm.org/D119277
2022-02-08 19:12:15 -05:00
Alexandre Ganea
83d59e05b2 Re-land [LLD] Remove global state in lldCommon
Move all variables at file-scope or function-static-scope into a hosting structure (lld::CommonLinkerContext) that lives at lldMain()-scope. Drivers will inherit from this structure and add their own global state, in the same way as for the existing COFFLinkerContext.

See discussion in https://lists.llvm.org/pipermail/llvm-dev/2021-June/151184.html

The previous land f860fe3622 caused issues in https://lab.llvm.org/buildbot/#/builders/123/builds/8383, fixed by 22ee510dac.

Differential Revision: https://reviews.llvm.org/D108850
2022-01-20 14:53:26 -05:00
Alexandre Ganea
e6b153947d Revert [LLD] Remove global state in lldCommon
It seems to be causing issues on https://lab.llvm.org/buildbot/#/builders/123/builds/8383
2022-01-16 11:03:06 -05:00
Alexandre Ganea
30a4020a7d [LLD] Supplement with more comments. Clarify the intention in f860fe3622. 2022-01-16 09:17:39 -05:00
Alexandre Ganea
f860fe3622 [LLD] Remove global state in lldCommon
Move all variables at file-scope or function-static-scope into a hosting structure (lld::CommonLinkerContext) that lives at lldMain()-scope. Drivers will inherit from this structure and add their own global state, in the same way as for the existing COFFLinkerContext.

See discussion in https://lists.llvm.org/pipermail/llvm-dev/2021-June/151184.html

Differential Revision: https://reviews.llvm.org/D108850
2022-01-16 08:57:57 -05:00
Nico Weber
085f078307 Revert "Revert D109159 "[amdgpu] Enable selection of s_cselect_b64.""
This reverts commit 859ebca744.
The change contained many unrelated changes and e.g. restored
unit test failes for the old lld port.
2022-01-05 13:10:25 -05:00
David Salinas
859ebca744 Revert D109159 "[amdgpu] Enable selection of s_cselect_b64."
This reverts commit 640beb38e7.

That commit caused performance degradtion in Quicksilver test QS:sGPU and a functional test failure in (rocPRIM rocprim.device_segmented_radix_sort).
Reverting until we have a better solution to s_cselect_b64 codegen cleanup

Change-Id: Ibf8e397df94001f248fba609f072088a46abae08

Reviewed By: kzhuravl

Differential Revision: https://reviews.llvm.org/D115960

Change-Id: Id169459ce4dfffa857d5645a0af50b0063ce1105
2022-01-05 17:57:32 +00:00
Luís Ferreira
10e40a4ea3 [lld] Add support for other demanglers other than Itanium
LLVM core library supports demangling other mangled symbols other than itanium,
such as D and Rust. LLD should use those demanglers in order to output pretty
demangled symbols on error messages.

Reviewed By: MaskRay, #lld-macho

Differential Revision: https://reviews.llvm.org/D116279
2022-01-05 03:25:41 +00:00
Luís Ferreira
8792cd75d0 Revert "[lld] Add support for other demanglers other than Itanium"
This reverts commit e60d6dfd5a.

clang-ppc64le-rhel buildbot failed (https://lab.llvm.org/buildbot#builders/57/builds/13424):

    tools/lld/MachO/CMakeFiles/lldMachO.dir/Symbols.cpp.o: In function `lld::demangle(llvm::StringRef, bool)':
    Symbols.cpp:(.text._ZN3lld8demangleEN4llvm9StringRefEb[_ZN3lld8demangleEN4llvm9StringRefEb]+0x90): undefined reference to `llvm::demangle(std::string const&)'
2021-12-30 18:04:21 +00:00
Luís Ferreira
e60d6dfd5a [lld] Add support for other demanglers other than Itanium
LLVM core library supports demangling other mangled symbols other than itanium,
such as D and Rust. LLD should use those demanglers in order to output pretty
demangled symbols on error messages.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D116279
2021-12-30 17:52:38 +00:00
Keith Smiley
9e3552523e [lld-macho] Remove old macho darwin lld
During the llvm round table it was generally agreed that the newer macho
lld implementation is feature complete enough to replace the old
implementation entirely. This will reduce confusion for new users who
aren't aware of the history.

Differential Revision: https://reviews.llvm.org/D114842
2021-12-02 11:04:49 -08:00
Nico Weber
64c1734438 [lld/mac] Write -v output to stderr
This matches ld64, and it's conceivable that projects try to read
this information off stderr for that reason.

--version keeps writing to stdout.

Differential Revision: https://reviews.llvm.org/D113020
2021-11-02 13:59:14 -04:00
Heejin Ahn
3ec1760d91 [WebAssembly] Remove WasmTagType
This removes `WasmTagType`. `WasmTagType` contained an attribute and a
signature index:
```
struct WasmTagType {
  uint8_t Attribute;
  uint32_t SigIndex;
};
```

Currently the attribute field is not used and reserved for future use,
and always 0. And that this class contains `SigIndex` as its property is
a little weird in the place, because the tag type's signature index is
not an inherent property of a tag but rather a reference to another
section that changes after linking. This makes tag handling in the
linker also weird that tag-related methods are taking both `WasmTagType`
and `WasmSignature` even though `WasmTagType` contains a signature
index. This is because the signature index changes in linking so it
doesn't have any info at this point. This instead moves `SigIndex` to
`struct WasmTag` itself, as we did for `struct WasmFunction` in D111104.

In this CL, in lib/MC and lib/Object, this now treats tag types in the
same way as function types. Also in YAML, this removes `struct Tag`,
because now it only contains the tag index. Also tags set `SigIndex` in
`WasmImport` union, as functions do.

I think this makes things simpler and makes tag handling more in line
with function handling. These two shares similar properties in that both
of them have signatures, but they are kind of nominal so having the same
signature doesn't mean they are the same element.

Also a drive-by fix: the reserved 'attirubute' part's encoding changed
from uleb32 to uint8 a while ago. This was fixed in lib/MC and
lib/Object but not in YAML. This doesn't change object files because the
field's value is always 0 and its encoding is the same for the both
encoding.

This is effectively NFC; I didn't mark it as such just because it
changed YAML test results.

Reviewed By: sbc100, tlively

Differential Revision: https://reviews.llvm.org/D111086
2021-10-05 17:11:22 -07:00
Amy Huang
6f7483b1ec Reland "[LLD] Remove global state in lld/COFF" after fixing asan and msan test failures
Original commit description:

  [LLD] Remove global state in lld/COFF

  This patch removes globals from the lldCOFF library, by moving globals
  into a context class (COFFLinkingContext) and passing it around wherever
  it's needed.

  See https://lists.llvm.org/pipermail/llvm-dev/2021-June/151184.html for
  context about removing globals from LLD.

  I also haven't moved the `driver` or `config` variables yet.

  Differential Revision: https://reviews.llvm.org/D109634

This reverts commit a2fd05ada9.

Original commits were b4fa71eed3
and e03c7e367a.
2021-09-17 17:18:42 -07:00
Amy Huang
a2fd05ada9 Temporarily revert "[LLD] Remove global state in lld/COFF" and "[lld] Add test to
check for timer output"

Seems to be causing a number of asan test failures.

This reverts commit b4fa71eed3
and e03c7e367a.
2021-09-16 11:58:11 -07:00
Amy Huang
b4fa71eed3 [LLD] Remove global state in lld/COFF
This patch removes globals from the lldCOFF library, by moving globals
into a context class (COFFLinkingContext) and passing it around wherever
it's needed.

See https://lists.llvm.org/pipermail/llvm-dev/2021-June/151184.html for
context about removing globals from LLD.

I also haven't moved the `driver` or `config` variables yet.

Differential Revision: https://reviews.llvm.org/D109634
2021-09-16 11:00:23 -07:00
Fangrui Song
0db402c5b4 [lld] Buffer writes when composing a single diagnostic
llvm::errs() is unbuffered. On a POSIX platform, composing a diagnostic
string may invoke the ::write syscall multiple times, which can be slow.
Buffer writes to a temporary SmallString when composing a single diagnostic to
reduce the number of ::write syscalls to one (also easier to read under
strace/truss).

For an invocation of ld.lld with 62000+ lines of
`ld.lld: warning: symbol ordering file: no such symbol: ` warnings (D87121),
the buffering decreases the write time from 1s to 0.4s (for /dev/tty) and
from 0.4s to 0.1s (for a tmpfs file). This can speed up
`relocation R_X86_64_PC32 out of range` diagnostic printing as well
with `--noinhibit-exec --no-fatal-warnings`.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D87272
2021-09-09 09:27:14 -07:00
Fangrui Song
323b9bf862 [lld] Replace LLVM_ATTRIBUTE_NORETURN with [[noreturn]]
[[noreturn]] can be used since 2016 when the minimum compiler requirement was bumped to GCC 4.8/MSVC 2015.
2021-07-27 18:51:17 -07:00
Heejin Ahn
1d891d44f3 [WebAssembly] Rename event to tag
We recently decided to change 'event' to 'tag', and 'event section' to
'tag section', out of the rationale that the section contains a
generalized tag that references a type, which may be used for something
other than exceptions, and the name 'event' can be confusing in the web
context.

See
- https://github.com/WebAssembly/exception-handling/issues/159#issuecomment-857910130
- https://github.com/WebAssembly/exception-handling/pull/161

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D104423
2021-06-17 20:34:19 -07:00
Nikita Popov
d93b678abb [lld] Add missing includes (NFC)
Fix lld build after 983565a6fe.
2021-06-03 18:55:18 +02:00
Yang Fan
062d4ddd22 [lld] Add missing header guard (NFC) 2021-04-02 11:12:23 +08:00
Jez Ng
9b6dde8af8 [lld-macho] Parallelize UUID hash computation
This reuses the approach (and some code) from LLD-ELF.

It's a decent win when linking chromium_framework on a Mac Pro (3.2 GHz 16-Core Intel Xeon W):

      N           Min           Max        Median           Avg        Stddev
  x  20          4.58          4.83          4.66        4.6685   0.066591844
  +  20          4.42          4.61           4.5         4.505    0.04751731
  Difference at 95.0% confidence
          -0.1635 +/- 0.0370242
          -3.5022% +/- 0.793064%
          (Student's t, pooled s = 0.0578462)

The output binary is 381MB.

Reviewed By: #lld-macho, oontvoo

Differential Revision: https://reviews.llvm.org/D99279
2021-03-31 15:48:36 -04:00
Nico Weber
cf59ffbfe3 fix comment typo to cycle bots 2021-02-17 11:49:23 -05:00
Andy Wingo
a56e57493b [lld][WebAssembly] Common superclass for input globals/events/tables
This commit regroups commonalities among InputGlobal, InputEvent, and
InputTable into the new InputElement.  The subclasses are defined
inline in the new InputElement.h.  NFC.

Reviewed By: sbc100

Differential Revision: https://reviews.llvm.org/D94677
2021-02-11 14:54:45 +01:00
Andy Wingo
53e3b81faa [lld][WebAssembly] Add support for handling table symbols
This commit adds table symbol support in a partial way, while still
including some special cases for the __indirect_function_table symbol.
No change in tests.

Differential Revision: https://reviews.llvm.org/D94075
2021-01-14 11:13:13 +01:00
Alexandre Ganea
45b8a741fb [LLD][COFF] When using LLD-as-a-library, always prevent re-entrance on failures
This is a follow-up for D70378 (Cover usage of LLD as a library).

While debugging an intermittent failure on a bot, I recalled this scenario which
causes the issue:

1.When executing lld/test/ELF/invalid/symtab-sh-info.s L45, we reach
  lld::elf::Obj-File::ObjFile() which goes straight into its base ELFFileBase(),
  then ELFFileBase::init().
2.At that point fatal() is thrown in lld/ELF/InputFiles.cpp L381, leaving a
  half-initialized ObjFile instance.
3.We then end up in lld::exitLld() and since we are running with LLD_IN_TEST, we
  hapily restore the control flow to CrashRecoveryContext::RunSafely() then back
  in lld::safeLldMain().
4.Before this patch, we called errorHandler().reset() just after, and this
  attempted to reset the associated SpecificAlloc<ObjFile<ELF64LE>>. That tried
  to free the half-initialized ObjFile instance, and more precisely its
  ObjFile::dwarf member.

Sometimes that worked, sometimes it failed and was catched by the
CrashRecoveryContext. This scenario was the reason we called
errorHandler().reset() through a CrashRecoveryContext.

But in some rare cases, the above repro somehow corrupted the heap, creating a
stack overflow. When the CrashRecoveryContext's filter (that is,
__except (ExceptionFilter(GetExceptionInformation()))) tried to handle the
exception, it crashed again since the stack was exhausted -- and that took the
whole application down. That is the issue seen on the bot. Locally it happens
about 1 times out of 15.

Now this situation can happen anywhere in LLD. Since catching stack overflows is
not a reliable scenario ATM when using CrashRecoveryContext, we're now
preventing further re-entrance when such failures occur, by signaling
lld::SafeReturn::canRunAgain=false. When running with LLD_IN_TEST=2 (or above),
only one iteration will be executed, instead of two.

Differential Revision: https://reviews.llvm.org/D88348
2020-11-12 08:14:43 -05:00
serge-sans-paille
1e70ec10eb [lld] Provide a hook to customize undefined symbols error handling
This is a follow up to https://reviews.llvm.org/D87758, implementing the missing
symbol part, as done by binutils.

Differential Revision: https://reviews.llvm.org/D89687
2020-11-09 13:28:48 +01:00
serge-sans-paille
cfc32267e2 Provide a hook to customize missing library error handling
Make it possible for lld users to provide a custom script that would help to
find missing libraries. A possible scenario could be:

    % clang /tmp/a.c -fuse-ld=lld -loauth -Wl,--error-handling-script=/tmp/addLibrary.py
    unable to find library -loauth
    looking for relevant packages to provides that library

        liboauth-0.9.7-4.el7.i686
        liboauth-devel-0.9.7-4.el7.i686
        liboauth-0.9.7-4.el7.x86_64
        liboauth-devel-0.9.7-4.el7.x86_64
        pix-1.6.1-3.el7.x86_64

Where addLibrary would be called with the missing library name as first argument
(in that case addLibrary.py oauth)

Differential Revision: https://reviews.llvm.org/D87758
2020-11-03 11:01:29 +01:00
Reid Kleckner
5519e4da83 Re-land "[PDB] Merge types in parallel when using ghashing"
Stored Error objects have to be checked, even if they are success
values.

This reverts commit 8d250ac3cd.
Relands commit 49b3459930655d879b2dc190ff8fe11c38a8be5f..

Original commit message:
-----------------------------------------

This makes type merging much faster (-24% on chrome.dll) when multiple
threads are available, but it slightly increases the time to link (+10%)
when /threads:1 is passed. With only one more thread, the new type
merging is faster (-11%). The output PDB should be identical to what it
was before this change.

To give an idea, here is the /time output placed side by side:
                              BEFORE    | AFTER
  Input File Reading:           956 ms  |  968 ms
  Code Layout:                  258 ms  |  190 ms
  Commit Output File:             6 ms  |    7 ms
  PDB Emission (Cumulative):   6691 ms  | 4253 ms
    Add Objects:               4341 ms  | 2927 ms
      Type Merging:            2814 ms  | 1269 ms  -55%!
      Symbol Merging:          1509 ms  | 1645 ms
    Publics Stream Layout:      111 ms  |  112 ms
    TPI Stream Layout:          764 ms  |   26 ms  trivial
    Commit to Disk:            1322 ms  | 1036 ms  -300ms
----------------------------------------- --------
Total Link Time:               8416 ms    5882 ms  -30% overall

The main source of the additional overhead in the single-threaded case
is the need to iterate all .debug$T sections up front to check which
type records should go in the IPI stream. See fillIsItemIndexFromDebugT.
With changes to the .debug$H section, we could pre-calculate this info
and eliminate the need to do this walk up front. That should restore
single-threaded performance back to what it was before this change.

This change will cause LLD to be much more parallel than it used to, and
for users who do multiple links in parallel, it could regress
performance. However, when the user is only doing one link, it's a huge
improvement. In the future, we can use NT worker threads to avoid
oversaturating the machine with work, but for now, this is such an
improvement for the single-link use case that I think we should land
this as is.

Algorithm
----------

Before this change, we essentially used a
DenseMap<GloballyHashedType, TypeIndex> to check if a type has already
been seen, and if it hasn't been seen, insert it now and use the next
available type index for it in the destination type stream. DenseMap
does not support concurrent insertion, and even if it did, the linker
must be deterministic: it cannot produce different PDBs by using
different numbers of threads. The output type stream must be in the same
order regardless of the order of hash table insertions.

In order to create a hash table that supports concurrent insertion, the
table cells must be small enough that they can be updated atomically.
The algorithm I used for updating the table using linear probing is
described in this paper, "Concurrent Hash Tables: Fast and General(?)!":
https://dl.acm.org/doi/10.1145/3309206

The GHashCell in this change is essentially a pair of 32-bit integer
indices: <sourceIndex, typeIndex>. The sourceIndex is the index of the
TpiSource object, and it represents an input type stream. The typeIndex
is the index of the type in the stream. Together, we have something like
a ragged 2D array of ghashes, which can be looked up as:
  tpiSources[tpiSrcIndex]->ghashes[typeIndex]

By using these side tables, we can omit the key data from the hash
table, and keep the table cell small. There is a cost to this: resolving
hash table collisions requires many more loads than simply looking at
the key in the same cache line as the insertion position. However, most
supported platforms should have a 64-bit CAS operation to update the
cell atomically.

To make the result of concurrent insertion deterministic, the cell
payloads must have a priority function. Defining one is pretty
straightforward: compare the two 32-bit numbers as a combined 64-bit
number. This means that types coming from inputs earlier on the command
line have a higher priority and are more likely to appear earlier in the
final PDB type stream than types from an input appearing later on the
link line.

After table insertion, the non-empty cells in the table can be copied
out of the main table and sorted by priority to determine the ordering
of the final type index stream. At this point, item and type records
must be separated, either by sorting or by splitting into two arrays,
and I chose sorting. This is why the GHashCell must contain the isItem
bit.

Once the final PDB TPI stream ordering is known, we need to compute a
mapping from source type index to PDB type index. To avoid starting over
from scratch and looking up every type again by its ghash, we save the
insertion position of every hash table insertion during the first
insertion phase. Because the table does not support rehashing, the
insertion position is stable. Using the array of insertion positions
indexed by source type index, we can replace the source type indices in
the ghash table cells with the PDB type indices.

Once the table cells have been updated to contain PDB type indices, the
mapping for each type source can be computed in parallel. Simply iterate
the list of cell positions and replace them with the PDB type index,
since the insertion positions are no longer needed.

Once we have a source to destination type index mapping for every type
source, there are no more data dependencies. We know which type records
are "unique" (not duplicates), and what their final type indices will
be. We can do the remapping in parallel, and accumulate type sizes and
type hashes in parallel by type source.

Lastly, TPI stream layout must be done serially. Accumulate all the type
records, sizes, and hashes, and add them to the PDB.

Differential Revision: https://reviews.llvm.org/D87805
2020-09-30 15:44:38 -07:00
Reid Kleckner
8d250ac3cd Revert "[PDB] Merge types in parallel when using ghashing"
This reverts commit 49b3459930.
2020-09-30 14:55:32 -07:00
Reid Kleckner
49b3459930 [PDB] Merge types in parallel when using ghashing
This makes type merging much faster (-24% on chrome.dll) when multiple
threads are available, but it slightly increases the time to link (+10%)
when /threads:1 is passed. With only one more thread, the new type
merging is faster (-11%). The output PDB should be identical to what it
was before this change.

To give an idea, here is the /time output placed side by side:
                              BEFORE    | AFTER
  Input File Reading:           956 ms  |  968 ms
  Code Layout:                  258 ms  |  190 ms
  Commit Output File:             6 ms  |    7 ms
  PDB Emission (Cumulative):   6691 ms  | 4253 ms
    Add Objects:               4341 ms  | 2927 ms
      Type Merging:            2814 ms  | 1269 ms  -55%!
      Symbol Merging:          1509 ms  | 1645 ms
    Publics Stream Layout:      111 ms  |  112 ms
    TPI Stream Layout:          764 ms  |   26 ms  trivial
    Commit to Disk:            1322 ms  | 1036 ms  -300ms
----------------------------------------- --------
Total Link Time:               8416 ms    5882 ms  -30% overall

The main source of the additional overhead in the single-threaded case
is the need to iterate all .debug$T sections up front to check which
type records should go in the IPI stream. See fillIsItemIndexFromDebugT.
With changes to the .debug$H section, we could pre-calculate this info
and eliminate the need to do this walk up front. That should restore
single-threaded performance back to what it was before this change.

This change will cause LLD to be much more parallel than it used to, and
for users who do multiple links in parallel, it could regress
performance. However, when the user is only doing one link, it's a huge
improvement. In the future, we can use NT worker threads to avoid
oversaturating the machine with work, but for now, this is such an
improvement for the single-link use case that I think we should land
this as is.

Algorithm
----------

Before this change, we essentially used a
DenseMap<GloballyHashedType, TypeIndex> to check if a type has already
been seen, and if it hasn't been seen, insert it now and use the next
available type index for it in the destination type stream. DenseMap
does not support concurrent insertion, and even if it did, the linker
must be deterministic: it cannot produce different PDBs by using
different numbers of threads. The output type stream must be in the same
order regardless of the order of hash table insertions.

In order to create a hash table that supports concurrent insertion, the
table cells must be small enough that they can be updated atomically.
The algorithm I used for updating the table using linear probing is
described in this paper, "Concurrent Hash Tables: Fast and General(?)!":
https://dl.acm.org/doi/10.1145/3309206

The GHashCell in this change is essentially a pair of 32-bit integer
indices: <sourceIndex, typeIndex>. The sourceIndex is the index of the
TpiSource object, and it represents an input type stream. The typeIndex
is the index of the type in the stream. Together, we have something like
a ragged 2D array of ghashes, which can be looked up as:
  tpiSources[tpiSrcIndex]->ghashes[typeIndex]

By using these side tables, we can omit the key data from the hash
table, and keep the table cell small. There is a cost to this: resolving
hash table collisions requires many more loads than simply looking at
the key in the same cache line as the insertion position. However, most
supported platforms should have a 64-bit CAS operation to update the
cell atomically.

To make the result of concurrent insertion deterministic, the cell
payloads must have a priority function. Defining one is pretty
straightforward: compare the two 32-bit numbers as a combined 64-bit
number. This means that types coming from inputs earlier on the command
line have a higher priority and are more likely to appear earlier in the
final PDB type stream than types from an input appearing later on the
link line.

After table insertion, the non-empty cells in the table can be copied
out of the main table and sorted by priority to determine the ordering
of the final type index stream. At this point, item and type records
must be separated, either by sorting or by splitting into two arrays,
and I chose sorting. This is why the GHashCell must contain the isItem
bit.

Once the final PDB TPI stream ordering is known, we need to compute a
mapping from source type index to PDB type index. To avoid starting over
from scratch and looking up every type again by its ghash, we save the
insertion position of every hash table insertion during the first
insertion phase. Because the table does not support rehashing, the
insertion position is stable. Using the array of insertion positions
indexed by source type index, we can replace the source type indices in
the ghash table cells with the PDB type indices.

Once the table cells have been updated to contain PDB type indices, the
mapping for each type source can be computed in parallel. Simply iterate
the list of cell positions and replace them with the PDB type index,
since the insertion positions are no longer needed.

Once we have a source to destination type index mapping for every type
source, there are no more data dependencies. We know which type records
are "unique" (not duplicates), and what their final type indices will
be. We can do the remapping in parallel, and accumulate type sizes and
type hashes in parallel by type source.

Lastly, TPI stream layout must be done serially. Accumulate all the type
records, sizes, and hashes, and add them to the PDB.

Differential Revision: https://reviews.llvm.org/D87805
2020-09-30 14:22:48 -07:00
Alexandre Ganea
f2efb5742c [LLD][COFF] Cover usage of LLD-as-a-library in tests
In lit tests, we run each LLD invocation twice (LLD_IN_TEST=2), without shutting down the process in-between. This ensures a full cleanup is properly done between runs.
Only active for the COFF driver for now. Other drivers still use LLD_IN_TEST=1 which executes just one iteration with full cleanup, like before.
When the environment variable LLD_IN_TEST is unset, a shortcut is taken, only one iteration is executed, no cleanup for faster exit, like before.
A public API, lld::safeLldMain(), is also available when using LLD as a library.

Differential Revision: https://reviews.llvm.org/D70378
2020-09-24 15:07:50 -04:00