clang-p2996

Author	SHA1	Message	Date
Fangrui Song	26ddf4eee2	[ELF] Change .debug_names tombstone value to UINT32_MAX/UINT64_MAX (#74686 ) `clang -g -gpubnames -fdebug-types-section` now emits .debug_names section with references to local type unit entries defined in COMDAT .debug_info sections. ``` .section .debug_info,"G",@progbits,5657452045627120676,comdat .Ltu_begin0: ... .section .debug_names,"",@progbits ... // DWARF32 .long .Ltu_begin0 # Type unit 0 // DWARF64 // .long .Ltu_begin0 # Type unit 0 ``` When `.Ltu_begin0` is relative to a non-prevailing .debug_info section, the relocation resolves to 0, which is a valid offset within the .debug_info section. ``` cat > a.cc <<e struct A { int x; }; inline A foo() { return {1}; } int main() { foo(); } e cat > b.cc <<e struct A { int x; }; inline A foo() { return {1}; } void use() { foo(); } e clang++ -g -gpubnames -fdebug-types-section -fuse-ld=lld a.cc b.cc -o old ``` ``` % llvm-dwarfdump old ... Local Type Unit offsets [ LocalTU[0]: 0x00000000 ] ... Local Type Unit offsets [ LocalTU[0]: 0x00000000 // indistinguishable from a valid offset within .debug_info ] ``` https://dwarfstd.org/issues/231013.1.html proposes that we use a tombstone value instead to inform consumers. This patch implements the idea. The second LocalTU entry will now use 0xffffffff. https://reviews.llvm.org/D84825 has a TODO that we should switch the tombstone value for most `.debug_*` sections to UINT64_MAX. We have postponed the change for more than three years for consumers to migrate. At some point we shall make the change, so that .debug_names is no long different from other debug section that is not .debug_loc/.debug_ranges. Co-authored-by: Alexander Yermolovich <ayermolo@meta.com>	2023-12-21 18:59:11 -08:00
Kazu Hirata	732bccb8c1	Use StringRef::{starts,ends}_with (NFC) This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with.	2023-12-14 07:53:20 -08:00
Fangrui Song	42e4967140	[ELF] Don't create copy relocation/canonical PLT entry for a defined symbol (#75095 ) Copy relocations and canonical PLT entries are for symbols defined in a DSO. Currently we create them even for a `Defined`, possibly leading to an output that won't work at run-time (e.g. R_X86_64_JUMP_SLOT referencing a null symbol). ``` % cat a.s .globl _start, main .type main, @function _start: main: ret .rodata .quad main % clang -fuse-ld=lld -pie -nostdlib a.s % readelf -Wr a.out Relocation section '.rela.plt' at offset 0x290 contains 1 entry: Offset Info Type Symbol's Value Symbol's Name + Addend 00000000000033b8 0000000000000007 R_X86_64_JUMP_SLOT 12b0 ``` Report an error instead for the default `-z text` mode. GNU ld reports an error in `-z text` mode as well.	2023-12-12 10:14:36 -08:00
Fangrui Song	3fd1d6953d	[ELF] relocateNonAlloc: clean up workaround code relocateNonAlloc is costly for .debug_* section relocating. We don't want to burn CPU cycles on other targets' workarounds. Remove a temporary workaround for Linux objtool after a proper fix https://git.kernel.org/linus/b8ec60e1186cdcfce41e7db4c827cb107e459002 Move the R_386_GOTPC workaround for GCC<8 beside the R_PC workaround.	2023-12-07 12:43:40 -08:00
Fangrui Song	a4d4b45aef	[ELF] relocateNonAlloc: move likely expr == R_ABS before unlikely R_SIZE. NFC	2023-12-07 12:10:42 -08:00
Fangrui Song	23d402e5b7	[ELF] IWYU <optional> NFC	2023-12-06 15:18:46 -08:00
Philip Reames	62213be872	[LLD][RISCV] Fix incorrect call relaxation when mixing +c and -c objects (#73977 ) This fixes a mis-link when mixing compressed and non-compressed input to LLD. When relaxing calls, we must respect the source file that the section came from when deciding whether it's legal to use compressed instructions. If the call in question comes from a non-rvc source, then it will not expect 2-byte alignments and cascading failures may result. This fixes https://github.com/llvm/llvm-project/issues/63964. The symptom seen there is that a latter RISCV_ALIGN can't be satisfied and we either fail an assert or produce a totally bogus link result. (It can be easily reproduced by putting .p2align 5 right before the nop in the reduced test case and running check-lld on an assertions enabled build.) However, it's important to note this is just one possible symptom of the problem. If the resulting binary has a runtime switch between rvc and non-rvc routines (via e.g. ifuncs), then even if we manage to link we may execute invalid instructions on a machine which doesn't implement compressed instructions.	2023-12-01 11:02:53 -08:00
Adrian Prantl	2c07181424	[LEB128] Don't initialize error on success This change removes an unnecessary branch from a hot path. It's also questionable API to override any previous error unconditonally.	2023-11-29 12:47:27 -08:00
Adrian Prantl	69b0cb9c56	Revert "[LEB128] Don't initialize error on success" This reverts commit `545c8e009e`.	2023-11-29 12:40:37 -08:00
Adrian Prantl	545c8e009e	[LEB128] Don't initialize error on success This change removes an unnecessary branch from a hot path. It's also questionable API to override any previous error unconditonally.	2023-11-29 12:16:32 -08:00
Weining Lu	84a20989c6	[lld][LoongArch] Add a another corner testcase for elf::getLoongArchPageDelta Similar to `e752b58e0d`.	2023-11-25 20:38:45 +08:00
Fangrui Song	7ffabb61a5	[ELF] Support R_RISCV_SET_ULEB128/R_RISCV_SUB_ULEB128 in non-SHF_ALLOC sections (#72610 ) For a label difference like `.uleb128 A-B`, MC generates a pair of R_RISCV_SET_ULEB128/R_RISCV_SUB_ULEB128 if A-B cannot be folded as a constant. GNU assembler generates a pair of relocations in more cases (when A or B is in a code section with linker relaxation). `.uleb128 A-B` is primarily used by DWARF v5 .debug_loclists/.debug_rnglists (DW_LLE_offset_pair/DW_RLE_offset_pair entry kinds) implemented in Clang and GCC. `.uleb128 A-B` can be used in SHF_ALLOC sections as well (e.g. `.gcc_except_table`). This patch does not handle SHF_ALLOC. `-z dead-reloc-in-nonalloc=` can be used to change the relocated value, if the R_RISCV_SET_ULEB128 symbol is in a discarded section. We don't check the R_RISCV_SUB_ULEB128 symbol since for the expected cases A and B should be defined in the same input section.	2023-11-21 07:43:29 -08:00
dong jianqiang	89f095d204	[lld][ELF] Add armeb support when incoming bc is arm big endian (#72604 ) Add armeb support when incoming bc is arm big endian: Fix error: could not infer e_machine from bitcode target triple armebv7-linux-gnueabi.	2023-11-20 21:12:06 -08:00
Fangrui Song	b8dface221	[ELF] -r: rename orphan SHT_REL/SHT_RELA when the relocated input section is placed in an output section This ports https://reviews.llvm.org/D40652 (--emit-relocs) to -r and matches GNU ld. Close #67910	2023-11-17 22:38:15 -08:00
Brad Smith	3a12001925	[lld][ELF] Recognize sparcv9 bitcode (#72609 )	2023-11-17 16:16:20 -05:00
Fangrui Song	ae7fb21b5a	[ELF] Make some InputSection/InputFile member functions const. NFC	2023-11-16 20:24:14 -08:00
Fangrui Song	255ea48608	[ELF] Merge verdefIndex into versionId. NFC (#72208 ) The two fields are similar. `versionId` is the Verdef index in the output file. It is set for `--exclude-id=`, version script patterns, and `sym@ver` symbols. `verdefIndex` is the Verdef index of a Sharedfile (SharedSymbol or a copy-relocated Defined), the default value -1 is also used to indicate that the symbol has not been matched by a version script pattern (https://reviews.llvm.org/D65716). It seems confusing to have two fields. Merge them so that we can allocate one bit for #70130 (suppress --no-allow-shlib-undefined error in the presence of a DSO definition).	2023-11-16 01:03:52 -08:00
Fangrui Song	e84575449f	Revert "[ELF] Merge verdefIndex into versionId. NFC" #72208 (#72484 ) Reverts llvm/llvm-project#72208 If a unversioned Defined preempts a versioned DSO definition, the version ID will not be reset.	2023-11-15 23:14:07 -08:00
Jinyang He	72accbfd0a	[lld][LoongArch] Support the R_LARCH_{ADD,SUB}6 relocation type (#72190 ) The R_LARCH_{ADD,SUB}6 relocation type are usually used by DwarfCFA to calculate a tiny offset. They appear after binutils 2.41, with GAS enabling relaxation by default.	2023-11-15 09:57:45 +08:00
Fangrui Song	667ea2ca40	[ELF] Merge verdefIndex into versionId. NFC (#72208 ) The two fields are similar. `versionId` is the Verdef index in the output file. It is set for version script patterns and `sym@ver` symbols. `verdefIndex` is the Verdef index of a SharedSymbol. The default value -1 is also used to indicate that the symbol has not been matched by a version script pattern (https://reviews.llvm.org/D65716). It seems confusing to have two fields. Merge them so that we can allocate one bit for #70130 (suppress --no-allow-shlib-undefined error in the presence of a DSO definition).	2023-11-14 10:20:21 -08:00
spupyrev	ef6d187115	[ELF] Fix assertion in cdsort (#71708 ) It seems that some functions (.text.unlikely.xxx) may have zero size, which makes some builds with enabled assertions fail. Removing the assertion and extending one test to fix the build. The sorting can process such zero-sized functions so no changes there are needed	2023-11-08 12:34:36 -08:00
Fangrui Song	339f5f727a	[ELF] Set `file` for synthesized _binary_ symbols Ensure the property that non-null `section` implies non-null `file`.	2023-11-06 12:05:13 -08:00
spupyrev	b53c04a8da	Reapply [ELF] Making cdsort default for function reordering (#68638 ) Edited lld/ELF/Options.td to cdsort as well CDSort function reordering outperforms the existing default heuristic ( hfsort/C^3) in terms of the performance of generated binaries while being (almost) as fast. Thus, the suggestion is to change the default. The speedup is up to 1.5% perf for large front-end binaries, and can be moderate/neutral for "small" benchmarks. High-level perf impact on two selected binaries: clang-10 binary (built with LTO+AutoFDO/CSSPGO): wins on top of C^3 in [0.3%..0.8%] rocksDB-8 binary (built with LTO+CSSPGO): wins on top of C^3 in [0.8%..1.5%] More detailed measurements on the clang binary is at [here](https://reviews.llvm.org/D152834#4445042)	2023-11-03 16:03:06 -07:00
Fangrui Song	b169e7fedd	[ELF] Improve undefined symbol message w/ DW_TAG_variable of the enclosing symbol but w/o line number information (#70854 ) The undefined symbol message suggests the source line when line number information is available (see https://reviews.llvm.org/D31481). When the undefined symbol is from a global variable, we won't get the line information. ``` extern int undef; namespace ns { int *var[] = { &undef }; // DW_TAG_variable(DW_AT_decl_file/DW_AT_decl_line) is available while // line number information is unavailable. } ld.lld: error: undefined symbol: undef >>> referenced by undef-debug2.cc >>> undef-debug2.o:(ns::var) ``` This patch utilizes `getEnclosingSymbol` to locate `var` and find DW_TAG_variable for `var`: ``` ld.lld: error: undefined symbol: undef >>> referenced by undef-debug2.cc:3 (/tmp/c/undef-debug2.cc:3) >>> undef-debug2.o:(ns::var) ```	2023-11-03 13:53:36 -07:00
Fangrui Song	56aa727907	[ELF] Add getEnclosingSymbol for code sharing. NFC	2023-11-03 13:43:28 -07:00
Fangrui Song	49168b2512	[ELF] Enhance --no-allow-shlib-undefined to report non-exported definition (#70769 ) For a DSO with all DT_NEEDED entries accounted for, if it contains an undefined non-weak symbol that shares a name with a non-exported definition (hidden visibility or localized by a version script), and there is no DSO definition, we should also report an error. Because the definition is not exported, it cannot resolve the DSO reference at runtime. GNU ld introduced this error-checking in [April 2003](https://sourceware.org/pipermail/binutils/2003-April/026568.html). The feature is available for executable links but not for -shared, and it is orthogonal to --no-allow-shlib-undefined. We make the feature part of --no-allow-shlib-undefined and work with -shared when --no-allow-shlib-undefined is specified. A subset of this error-checking is covered by commit `1981b1b6b9` for --gc-sections discarded sections. This patch covers non-discarded sections as well. Internally, I have identified 2 bugs (which would fail with LD_BIND_NOW=1) covered by commit `1981b1b6b9`	2023-11-03 11:05:09 -07:00
Yaxun (Sam) Liu	3594769f20	[ELF] Define NOMINMAX to fix zlib.h caused build failure on Windows (#70368 ) On Windows when zlib is enabled, zlib header introduced some Windows headers which defines max as a macro. Since OutputSections.cpp uses std::max with template argument, this causes compilation error. Define macro NOMINMAX to avoid this.	2023-11-02 08:59:54 -04:00
Fangrui Song	a40f651a06	[ELF] adjustOutputSections: don't copy SHF_EXECINSTR when an output does not contain input sections (#70911 ) For an output section with no input section, GNU ld eliminates the output section when there are only symbol assignments (e.g. `.foo : { symbol = 42; }`) but not for `.foo : { . += 42; }` (`SHF_ALLOC\|SHF_WRITE`). We choose to retain such an output section with a symbol assignment (unless unreferenced `PROVIDE`). We copy the previous section flag (see https://reviews.llvm.org/D37736) to hopefully make the current PT_LOAD segment extend to the current output section: * decrease the number of PT_LOAD segments * If a new PT_LOAD segment is introduced without a page-size alignment as a separator, there may be a run-time crash. However, this `flags` copying behavior is not suitable for `.foo : { . += 42; }` when `flags` contains `SHF_EXECINSTR`. The executable bit is surprising (https://discourse.llvm.org/t/lld-output-section-flag-assignment-behavior/74359). I think we should drop SHF_EXECINSTR when copying `flags`. The risk is a code section followed by `.foo : { symbol = 42; }` will be broken, which I believe is unrelated as such uses are almost always related to data sections. For data-command-only output sections (e.g. `.foo : { QUAD(42) }`), we keep allowing copyable SHF_WRITE. Some tests are updated to drop the SHF_EXECINSTR flag. GNU ld doesn't set SHF_EXECINSTR as well, though it sets SHF_WRITE for some tests while we don't.	2023-11-01 22:35:28 -07:00
Fangrui Song	ec0e556e67	[ELF] Merge copyLocalSymbols and demoteLocalSymbolsInDiscardedSections (#69425 ) Follow-up to #69295: In `Writer<ELFT>::run`, the symbol passes are flexible: they can be placed almost everywhere before `scanRelocations`, with a constraint that the `computeIsPreemptible` pass must be invoked for linker-defined non-local symbols. Merge copyLocalSymbols and demoteLocalSymbolsInDiscardedSections to simplify code: * Demoting local symbols can be made unconditional, not constrainted to /DISCARD/ uses due to performance concerns * `includeInSymtab` can be made faster * Make symbol passes close to each other * Decrease data cache misses due to saving an iteration over local symbols There is no speedup, likely due to the unconditional `dr->section` access in `demoteAndCopyLocalSymbols`. `gc-sections-tls.s` no longer reports an error because the TLS symbol is converted to an Undefined.	2023-10-18 08:56:17 -07:00
Fangrui Song	bbf7b9d805	[ELF] Remove unused setSymbolAndType after #69295 . NFC One use of setSymbolAndType (related to https://reviews.llvm.org/D53864 "Do not crash when -r output uses linker script with `/DISCARD/`") is no longer needed after commit `1981b1b6b9` demotes symbols in discarded sections to Undefined.	2023-10-17 15:08:13 -07:00
Fangrui Song	1981b1b6b9	[ELF] Demote symbols in /DISCARD/ discarded sections to Undefined (#69295 ) When an input section is matched by /DISCARD/ in a linker script, GNU ld reports errors for relocations referencing symbols defined in the section: `.aaa' referenced in section `.bbb' of a.o: defined in discarded section `.aaa' of a.o Implement the error by demoting eligible symbols to `Undefined` and changing STB_WEAK to STB_GLOBAL. As a side benefit, in relocatable links, relocations referencing symbols defined relative to /DISCARD/ discarded sections no longer set symbol/type to zeros. It's arguable whether a weak reference to a discarded symbol should lead to errors. GNU ld reports an error and our demoting approach reports an error as well. Close #58891 Co-authored-by: Bevin Hansson <bevin.hansson@ericsson.com>	2023-10-17 14:10:52 -07:00
Fangrui Song	fc5d815d54	[ELF] Merge demoteSymbols and isPreemptible computation. NFC Remove one iteration of symtab and slightly improve the performance.	2023-10-17 13:52:08 -07:00
Fangrui Song	e9b9a1d320	[ELF] Move demoteSymbols to Writer.cpp. NFC History of demoteSharedSymbols: * https://reviews.llvm.org/D45536 demotes SharedSymbol * https://reviews.llvm.org/D111365 demotes lazy symbols * The pending #69295 will demote symbols defined in discarded sections The pass is placed after markLive just to be clear that it needs `isNeeded` information computed by markLive. The remaining passes in Driver.cpp do not use symbol information. Move the pass to Writer.cpp to be closer to other symbol-related passes.	2023-10-17 13:16:50 -07:00
Fangrui Song	60b3e05967	[ELF] Restore the --call-graph-profile-sort=hfsort default before #68638 The high time complexity of cache-directed sort is a real issue and is not appropriate as the default, at least for now (https://github.com/llvm/llvm-project/pull/68638#issuecomment-1760918891).	2023-10-12 22:58:42 -07:00
Kazu Hirata	4a0ccfa865	Use llvm::endianness::{big,little,native} (NFC) Note that llvm::support::endianness has been renamed to llvm::endianness while becoming an enum class as opposed to an enum. This patch replaces support::{big,little,native} with llvm::endianness::{big,little,native}.	2023-10-12 21:21:45 -07:00
Kazu Hirata	b8885926f8	Use llvm::endianness::{big,little,native} (NFC) Note that llvm::support::endianness has been renamed to llvm::endianness while becoming an enum class as opposed to an enum. This patch replaces llvm::support::{big,little,native} with llvm::endianness::{big,little,native}.	2023-10-10 22:54:51 -07:00
spupyrev	d5c1d735ad	[ELF] Making cdsort default for function reordering (#68638 ) CDSort function reordering outperforms the existing default heuristic ( hfsort/C^3) in terms of the performance of generated binaries while being (almost) as fast. Thus, the suggestion is to change the default. The speedup is up to 1.5% perf for large front-end binaries, and can be moderate/neutral for "small" benchmarks. High-level perf impact on two selected binaries: clang-10 binary (built with LTO+AutoFDO/CSSPGO): wins on top of C^3 in [0.3%..0.8%] rocksDB-8 binary (built with LTO+CSSPGO): wins on top of C^3 in [0.8%..1.5%] More detailed measurements on the clang binary is at [here](https://reviews.llvm.org/D152834#4445042)	2023-10-10 09:06:31 -07:00
Mitch Phillips	144d127bef	[lld] [MTE] Drop MTE globals for fully static executables, not ban (#68217 ) Integrating MTE globals on Android revealed a lot of cases where libraries are built as both archives and DSOs, and they're linked into fully static and dynamic executables respectively. MTE globals doesn't work for fully static executables. They need a dynamic loader to process the special R_AARCH64_RELATIVE relocation semantics with the encoded offset. Fully static executables that had out-of-bounds derived symbols (like 'int* foo_end = foo[16]') crash under MTE globals w/ static executables. So, LLD in its current form simply errors out when you try and compile a fully static executable that has a single MTE global variable in it. It seems like a much better idea to simply have LLD not do the special work for MTE globals in fully static contexts, and to drop any unnecessary metadata. This means that you can build archives with MTE globals and link them into both fully-static and dynamic executables.	2023-10-10 17:32:10 +02:00
Kazu Hirata	d7b18d5083	Use llvm::endianness{,::little,::native} (NFC) Now that llvm::support::endianness has been renamed to llvm::endianness, we can use the shorter form. This patch replaces llvm::support::endianness with llvm::endianness.	2023-10-09 00:54:47 -07:00
Ben Shi	488a62f86e	[lld][ELF][AVR] Add range check for R_AVR_13_PCREL (#67636 ) Some large AVR programs (for devices without long jump) may exceed 128KiB, and lld should give explicit errors other than generate wrong executables silently.	2023-10-06 18:20:03 +08:00
Arthur Eubanks	9d6ec280fc	[lld/ELF] Don't relax R_X86_64_(REX_)GOTPCRELX when offset is too far For each R_X86_64_(REX_)GOTPCRELX relocation, check that the offset to the symbol is representable with 2^32 signed offset. If not, add a GOT entry for it and set its expr to R_GOT_PC so that we emit the GOT load instead of the relaxed lea. Do this in finalizeAddressDependentContent() where we iteratively attempt this (e.g. RISCV uses this for relaxation, ARM uses this to insert thunks). Decided not to do the opposite of inserting GOT entries initially and removing them when relaxable because removing GOT entries isn't simple. One drawback of this approach is that if we see any GOTPCRELX relocation, we'll create an empty .got even if it's not required in the end. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D157020	2023-10-04 13:03:56 -07:00
Alexandre Ganea	a2ef046a2d	[LLD][ELF] Import `ObjFile::importCmseSymbols` at call site (#68025 ) Before this patch, with MSVC I was seeing: ``` [304/334] Building CXX object tools\lld\ELF\CMakeFiles\lldELF.dir\InputFiles.cpp.obj C:\git\llvm-project\lld\ELF\InputFiles.h(327): warning C4661: 'void lld::elf::ObjFile<llvm::object::ELF32LE>::importCmseSymbols(void)': no suitable definition provided for explicit template instantiation request C:\git\llvm-project\lld\ELF\InputFiles.h(291): note: see declaration of 'lld::elf::ObjFile<llvm::object::ELF32LE>::importCmseSymbols' C:\git\llvm-project\lld\ELF\InputFiles.h(327): warning C4661: 'void lld::elf::ObjFile<llvm::object::ELF32LE>::redirectCmseSymbols(void)': no suitable definition provided for explicit template instantiation request C:\git\llvm-project\lld\ELF\InputFiles.h(292): note: see declaration of 'lld::elf::ObjFile<llvm::object::ELF32LE>::redirectCmseSymbols' ``` This patch removes `redirectCmseSymbols` which is not defined. And it imports `importCmseSymbols` in InputFiles.cpp, because it is already explicitly instantiated in ARM.cpp.	2023-10-03 10:12:09 -04:00
simpal01	3cde1d8000	[ELF] Handle relocations in synthetic .eh_frame with a non-zero offset within the output section (#65966 ) When the .eh_frame section is placed at a non-zero offset within its output section, the relocation value within .eh_frame are computed incorrectly. We had similar issue in .ARM.exidx section and it has been fixed already in https://reviews.llvm.org/D148033. While applying the relocation using S+A-P, the value of P (the location of the relocation) is getting wrong. P is: P = SecAddr + rel.offset, But SecAddr points to the starting address of the outputsection rather than the starting address of the eh frame section within that output section. This issue affects all targets which generates .eh_frame section. Hence fixing in all the corresponding targets it affecting.	2023-10-03 10:20:14 +01:00
Matheus Izvekov	8ff77a8f04	[NFC][LLD] Refactor some copy-paste into the Common library (#67598 )	2023-09-28 00:06:48 +02:00
spupyrev	904b3f66f5	[ELF] A new code layout algorithm for function reordering [3a/3] We are brining a new algorithm for function layout (reordering) based on the call graph (extracted from a profile data). The algorithm is an improvement of top of a known heuristic, C^3. It tries to co-locate hot and frequently executed together functions in the resulting ordering. Unlike C^3, it explores a larger search space and have an objective closely tied to the performance of instruction and i-TLB caches. Hence, the name CDS = Cache-Directed Sort. The algorithm can be used at the linking or post-linking (e.g., BOLT) stage. Refer to https://reviews.llvm.org/D152834 for the actual implementation of the reordering algorithm. This diff adds a linker option to replace the existing C^3 heuristic with CDS. The new behavior can be turned on by passing "--use-cache-directed-sort". (the plan is to make it default in a next diff) Perf-impact clang-10 binary (built with LTO+AutoFDO/CSSPGO): wins on top of C^3 in [0.3%..0.8%] rocksDB-8 binary (built with LTO+CSSPGO): wins on top of C^3 in [0.8%..1.5%] Note that function layout affects the perf the most on older machines (with smaller instruction/iTLB caches) and when huge pages are not enabled. The impact on newer processors with huge pages enabled is likely neutral/minor. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D152840	2023-09-26 06:24:34 -07:00
Fangrui Song	8c556b7e2b	[ELF] Change --call-graph-profile-sort to accept an argument Change the FF form --call-graph-profile-sort to --call-graph-profile-sort={none,hfsort}. This will be extended to support llvm/lib/Transforms/Utils/CodeLayout.cpp. --call-graph-profile-sort is not used in the wild but --no-call-graph-profile-sort is (Chromium). Make --no-call-graph-profile-sort an alias for --call-graph-profile-sort=none. Reviewed By: rahmanl Differential Revision: https://reviews.llvm.org/D159544	2023-09-25 09:49:40 -07:00
Fangrui Song	f5b42eaadb	[ELF] -r --compress-debug-sections: update implicit addends for .rel.debug_* referencing STT_SECTION symbols (#66804 ) https://reviews.llvm.org/D48929 updated addends for non-SHF_ALLOC sections relocated by REL for -r links, but the patch did not update the addends when --compress-debug-sections={zlib,zstd} is used (#66738). https://reviews.llvm.org/D116946 handled tombstone values in debug sections in relocatable links. As a side effect, both relocateNonAllocForRelocatable (using `sec->relocations`) and relocatenonNonAlloc (using raw REL/RELA) may run. Actually, we can adjust the condition in relocatenonAlloc to completely replace relocateNonAllocForRelocatable. This patch implements this idea and fixes #66738. As relocateNonAlloc processes the raw relocations like copyRelocations() does, the condition `if (config->relocatable && type != target.noneRel)` in `copyRelocations` (commit `08d6a3f133`, modified by https://reviews.llvm.org/D62052) can be made specific to SHF_ALLOC sections. As a side effect, we can now report diagnostics for PC-relative relocations for -r. This is a less useful diagnostic that is not worth too much code. As https://github.com/ClangBuiltLinux/linux/issues/1937 has violations, just suppress the warning for -r. Tested by commit `561b98f9e0`.	2023-09-20 14:50:13 -07:00
Fangrui Song	0de0b6dded	[ELF] Postpone "unable to move location counter backward" error (#66854 ) The size of .ARM.exidx may shrink across `assignAddress` calls. It is possible that the initial iteration has a larger location counter, causing `__code_size = __code_end - .; osec : { . += __code_size; }` to report an error, while the error would have been suppressed for subsequent `assignAddress` iterations. Other sections like .relr.dyn may change sizes across `assignAddress` calls as well. However, their initial size is zero, so it is difficiult to trigger a similar error. Similar to https://reviews.llvm.org/D152170, postpone the error reporting. Fix #66836. While here, add more information to the error message.	2023-09-20 09:06:45 -07:00
Fangrui Song	678c1f142c	[ELF] Remove a R_ARM_PCA special case from relocateNonAlloc https://reviews.llvm.org/D75042 added a special case about R_ARM_PCA to relocateNonAlloc. This is untested and actually unused in the wild.	2023-09-19 21:04:50 -07:00
modimo	272bd6f9cc	[WPD][LLD] Add option to validate RTTI is enabled on all native types and prevent devirtualization on types with native RTTI Discussion about this approach: https://discourse.llvm.org/t/rfc-safer-whole-program-class-hierarchy-analysis/65144/18 When enabling WPD in an environment where native binaries are present, types we want to optimize can be derived from inside these native files and devirtualizing them can lead to correctness issues. RTTI can be used as a way to determine all such types in native files and exclude them from WPD providing a safe checked way to enable WPD. The approach is: 1. In the linker, identify if RTTI is available for all native types. If not, under `--lto-validate-all-vtables-have-type-infos` `--lto-whole-program-visibility` is automatically disabled. This is done by examining all .symtab symbols in object files and .dynsym symbols in DSOs for vtable (_ZTV) and typeinfo (_ZTI) symbols and ensuring there's always a match for every vtable symbol. 2. During thinlink, if `--lto-validate-all-vtables-have-type-infos` is set and RTTI is available for all native types, identify all typename (_ZTS) symbols via their corresponding typeinfo (_ZTI) symbols that are used natively or outside of our summary and exclude them from WPD. Testing: ninja check-all large Meta service that uses boost, glog and libstdc++.so runs successfully with WPD via --lto-whole-program-visibility. Previously, native types in boost caused incorrect devirtualization that led to crashes. Reviewed By: MaskRay, tejohnson Differential Revision: https://reviews.llvm.org/D155659	2023-09-18 15:51:49 -07:00

1 2 3 4 5 ...

7236 Commits