clang-p2996

Author	SHA1	Message	Date
Maksim Panchenko	9b4328fbfa	[BOLT][NFC] Refactor RI::discoverFileObjects() Minor refactoring to delete redundant code. Reviewed By: jobnoorman Differential Revision: https://reviews.llvm.org/D159525	2023-09-18 11:29:29 -07:00
Maksim Panchenko	1e9b006add	[BOLT] Speedup symbol table sort Memoize SymbolRef::getAddress() for sorting symbol table entries by their address. Saves about 10 seconds of processing time on large binaries with over 2 million symbols. NFCI. Reviewed By: jobnoorman, Amir Differential Revision: https://reviews.llvm.org/D159524	2023-09-18 11:28:40 -07:00
Sinan Lin	7b4b09a59a	[Bolt] fix a relocation bug for R_AARCH64_CALL26 If the R_AARCH64_CALL26 against a symbol that has a lower address, then encodeValueAArch64 will return a wrong value. Reviewed By: Kepontry, yota9 Differential Revision: https://reviews.llvm.org/D159513	2023-09-18 19:55:35 +08:00
zhoujiapeng	473b9dd442	[BOLT] Incorporate umask into the output file permission Fix https://github.com/llvm/llvm-project/issues/65061 Reviewed By: maksfb, Amir Differential Revision: https://reviews.llvm.org/D159407	2023-09-17 00:12:52 +08:00
zhoujiapeng	16fd879980	[BOLT] Skip the validation of CFG after it is finalized When current state is `CFG_Finalized`, function `validateCFG()` should return true directly. Reviewed By: maksfb, yota9, Kepontry Differential Revision: https://reviews.llvm.org/D159410	2023-09-17 00:07:59 +08:00
zhoujing	de6a919f77	[BOLT] Fix deadloop bug in taildup The intent is clearly to push the tail rather than current BB. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D159289	2023-09-15 22:12:53 +08:00
Job Noorman	c5ba61978c	[BOLT][RISCV] Add support for linker relaxation Calls on RISC-V are typically compiled to `auipc`/`jalr` pairs to allow a maximum target range (32-bit pc-relative). In order to optimize calls to near targets, linker relaxation may replace those pairs with, for example, single `jal` instructions. To allow BOLT to freely reassign function addresses in relaxed binaries, this patch proposes the following approach: - Expand all relaxed calls back to `auipc`/`jalr`; - Rely on JITLink to relax those back to shorter forms where possible. This is implemented by detecting all possible call instructions and replacing them with `PseudoCALL` (or `PseudoTAIL`) instructions. The RISC-V backend then expands those and adds the necessary relocations for relaxation. Since BOLT generally ignores pseudo instruction, this patch makes `MCPlusBuilder::isPseudo` virtual so that `RISCVMCPlusBuilder` can override it to exclude `PseudoCALL` and `PseudoTAIL`. To ensure JITLink knows about the correct section addresses while relaxing, reassignment of addresses has been moved to a post-allocation pass. Note that this is probably the time it had to be done in the first place since in `notifyResolved` (where it was done before), all symbols are supposed to be resolved already. Depends on D159082 Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D159089	2023-09-15 11:57:28 +02:00
Amir Ayupov	4a6426a802	[BOLT][NFC] Simplify RI::selectFunctionsToProcess Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D159516	2023-09-14 11:57:44 -07:00
Amir Ayupov	4627446d38	[BOLT] Fix AutoFDO output format after D154120 AutoFDO profile has no leading 0x in hex dumps. Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D159507	2023-09-12 13:58:25 -07:00
Job Noorman	1cf2599a64	[BOLT] Prevent adding secondary entry points for BB labels When linker relaxation is enabled on RISC-V, every branch has a relocation and a corresponding symbol in the symbol table. BOLT currently registers all these symbols as secondary entry points causing almost every function to be marked as multi entry on RISC-V. This patch modifies `adjustFunctionBoundaries` to ignore these symbols. Note that I currently try to detect them by checking if a symbol's name starts with the private label prefix as defined by `MCAsmInfo`. Since I'm not entirely sure what multi-entry functions look like on different targets, please check if this condition is correct. Maybe it could make sense to only check this on RISC-V? Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D159285	2023-09-12 13:44:56 +02:00
Amir Ayupov	7b750943d7	[BOLT][NFC] Speedup YAML profile processing Reduce YAML profile processing times: - preprocessProfile: speed up buildNameMaps by replacing ProfileNameToProfile mapping with ProfileFunctionNames set and ProfileBFs vector. Pre-look up YamlBF->BF correspondence, memoize in ProfileBFs. - readProfile: replace iteration over all functions in the binary by iteration over profile functions (strict match and LTO name match). On a large binary (1.9M functions) and large YAML profile (121MB, 30k functions) reduces profile steps runtime: pre-process profile data: 12.4953s -> 10.7123s process profile data: 9.8195s -> 5.6639s Compared to fdata profile reading: pre-process profile data: 8.0268s process profile data: 1.0265s process profile data pre-CFG: 0.1644s Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D159460	2023-09-11 16:07:57 -07:00
Amir Ayupov	ffef4fe0db	[BOLT][NFC] Use formatv in DataAggregator/DataReader prints Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D154120	2023-09-11 16:01:02 -07:00
Job Noorman	1b78742e77	[BOLT][RISCV] Implement R_RISCV_PCREL_LO12_S (#65204 ) Relocation used for store instructions.	2023-09-09 08:22:37 +00:00
spupyrev	42da84fda9	[BOLT] Always match stale entry blocks Two (minor) improvements for stale matching: - always match entry blocks to each other, even if there is a hash mismatch; - ignore nops in (loose) hash computation. I record a small improvement in inference quality on my benchmarks. Tests are not affected Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D159488	2023-09-08 15:46:20 -07:00
David Spickett	f48bd86bb9	[bolt][X86] Correct 2 test RUN lines (#65252 ) One had an extra ;, which is odd but harmless. The other was missing ":" after RUN.	2023-09-08 08:42:45 +01:00
Amir Ayupov	04a6dc24db	[BOLT][test] Fix patch-entries for aarch64 buildbot (#65690 )	2023-09-07 17:09:45 -07:00
Elvina Yakubova	6678f602c2	[BOLT][test] Fix cross-compilation tests after D151920 Fix tests that are failing in cross-compilation after D151920 (https://lab.llvm.org/buildbot/#/builders/221/builds/17715): - instrumentation-ind-call, basic-instrumentation: add -mno-outline-atomics flag to runtime lib - bolt-address-translation-internal-call, internal-call-instrument: add %cflags - meta-merge-fdata: restrict to x86_64 Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D159094	2023-09-08 00:05:39 +03:00
Amir Ayupov	7248e57a4b	[BOLT][NFC] Fix duplicate word typo Based on https://reviews.llvm.org/D137338	2023-09-01 13:29:01 -07:00
Elvina Yakubova	777e268b81	[BOLT][test] Enable exceptions_split tests for AArch64 Since the issue with trap value is fixed in D158191, it now should pass on both platforms. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D158899	2023-09-01 10:45:53 +03:00
Job Noorman	eafe4ee2e8	[BOLT] Rename isLoad/isStore to mayLoad/mayStore As discussed in D159266, for some instructions it's impossible to know statically if they will load/store (e.g., predicated instructions). Therefore, mayLoad/mayStore are more appropriate names.	2023-09-01 09:36:05 +02:00
Job Noorman	76f040bda6	[BOLT] Provide generic implementations for isLoad/isStore `MCInstrDesc` provides the `mayLoad` and `mayStore` flags that seem appropriate to use as a target-independent way to implement `isLoad` and `isStore`. I believe this is currently good enough to use for the RISC-V target as well. I've provided a test for this that checks the generated dyno stats (which seems to be the only thing both `isLoad` and `isStore` are used for). Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D159266	2023-09-01 09:36:05 +02:00
Amir Ayupov	8f9006bfa0	[BOLT][test] Move asm-dump.c to runtime/X86 Since the test executes instrumented version of the binary, move it under runtime/X86. Note that it can be adjusted to also run under AArch64 now that instrumentation is supported. Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D159298	2023-08-31 10:59:28 -07:00
spupyrev	1256ef274c	[BOLT] Fine-tuning hash computation for stale matching Fine-tuning hash computation for stale matching: - introducing a new "loose" basic block hash that allows to match many more blocks than before; - tweaking params of the inference algorithm that find (slightly) better solutions; - added more meaningful tests for stale matching. Tested the changes on several open-source benchmarks (clang, rocksdb, chrome) and one prod workload using different compiler modes (LTO/PGO etc). There is always an improvement in the quality of inferred profiles. (The current implementation is still not optimal but the diff is a step forward; I am open to further suggestions) Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D156278	2023-08-31 07:29:02 -07:00
Sinan Lin	9c99e9fd68	[BOLT] Fix a bug related to iterators in ReorderData pass If `Itr` is the last element and then `std::next(Itr)` will be `Range.end()`, so that the statement `std::next(Itr)->second` is a UB. Reviewed By: yota9, maksfb Differential Revision: https://reviews.llvm.org/D159177	2023-08-31 11:10:25 +08:00
hezuoqiang	83f5497155	[BOLT] BL/BH are considered aliases in regreassign The relationship of X86 registers is shown in the diagram. BL and BH do not have a direct alias relationship. However, if the BH register cannot be swapped, then the BX/EBX/RBX registers cannot be swapped as well, which means that BL register also cannot be swapped. Therefore, in the presence of BX/EBX/RBX registers, BL and BH have an alias relationship. ┌────────────────┐ │ RBX │ ├────┬───────────┤ │ │ EBX │ ├────┴──┬────────┤ │ │ BX │ ├───────┼───┬────┤ │ │BH │BL │ └───────┴───┴────┘ Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D155098	2023-08-28 22:57:24 +08:00
Job Noorman	475a93a07a	[BOLT] Calculate output values using BOLTLinker BOLT uses `MCAsmLayout` to calculate the output values of functions and basic blocks. This means output values are calculated based on a pre-linking state and any changes to symbol values during linking will cause incorrect values to be used. This issue can be triggered by enabling linker relaxation on RISC-V. Since linker relaxation can remove instructions, symbol values may change. This causes, among other things, the symbol table created by BOLT in the output executable to be incorrect. This patch solves this issue by using `BOLTLinker` to get symbol values instead of `MCAsmLayout`. This way, output values are calculated based on a post-linking state. To make sure the linker can update all necessary symbols, this patch also makes sure all these symbols are not marked as temporary so that they end-up in the object file's symbol table. Note that this patch only deals with symbols of binary functions (`BinaryFunction::updateOutputValues`). The technique described above turned out to be too expensive for basic block symbols so those are handled differently in D155604. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D154604	2023-08-28 10:13:07 +02:00
Kazu Hirata	d791fa26a9	[BOLT] Use SmallPtrSet::contains (NFC)	2023-08-27 13:18:38 -07:00
Rafael Auler	b9deec1cd9	[BOLT] Fix cross-compilation build Don't enable BOLT runtime when cross compiling as we don't support this scenario yet. Differential Revision: https://reviews.llvm.org/D158906	2023-08-25 17:33:04 -07:00
Rafael Auler	b59cf211a0	[BOLT] Don't choke on injected functions' IO map AddressMap would fail lookup for injected functions and crash BOLT. Fix that. Reviewed By: #bolt, maksfb, jobnoorman Differential Revision: https://reviews.llvm.org/D158685	2023-08-24 12:02:55 -07:00
Rafael Auler	b5ac1697c8	[BOLT] Give precedence to first AddressMap entries When parsing AddressMap and there is a conflict in keys, where two entries share the same key, consider the first entry as the correct one, instead of the last. This matches previous behavior in BOLT and covers case such as BOLT creating a new basic block but sharing the same input offset of the previous (or entry) basic block. In this case, instead of translating debuginfo to use the newly created BB, translate using the BB that was originally read from input. This will increase our chances of getting debuginfo right. Tested via binary comparison in tests: X86/dwarf4-df-input-lowpc-ranges.test X86/dwarf5-df-input-lowpc-ranges.test Reviewed By: #bolt, maksfb, jobnoorman Differential Revision: https://reviews.llvm.org/D158686	2023-08-24 11:59:43 -07:00
Eymen Ünay	d7add58cff	[BOLT] Fix typo in comment Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D157206	2023-08-24 09:37:48 -07:00
Elvina Yakubova	83cb541f80	[BOLT][Instrumentation][test] Fix tests Extend tests for instrumentation Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D151920	2023-08-24 19:34:58 +03:00
Elvina Yakubova	87e9c42495	[BOLT][Instrumentation] AArch64 instrumentation support in runtime This commit adds support for AArch64 in instrumentation runtime library, including AArch64 system calls. Also this commit divides syscalls into target-specific files. Reviewed By: rafauler, yota9 Differential Revision: https://reviews.llvm.org/D151942	2023-08-24 19:34:57 +03:00
Elvina Yakubova	70405a0bf7	[BOLT][Instrumentation] Add support for MacOS counters This commit adds support for generation of getter counters for AArch64 MacOS. Continuation of work D151899 Reviewed By: rafauleir, yota9 Differential Revision: https://reviews.llvm.org/D151901	2023-08-24 19:34:57 +03:00
Elvina Yakubova	6e4c230525	[BOLT][Instrumentation] Initial instrumentation support for AArch64 This commit adds code generation for AArch64 instrumentation, including direct and indirect calls support. Reviewed By: rafauler, yota9 Differential Revision: https://reviews.llvm.org/D151899	2023-08-24 19:34:57 +03:00
Denis Revunov	82ed7896cf	[BOLT] Add test for emitting trap value Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D158191	2023-08-24 01:30:02 +03:00
Denis Revunov	28fd2ca142	[BOLT] Fix trap value for non-X86 The trap value used by BOLT was assumed to be single-byte instruction. It made some functions unaligned on AArch64(e.g exceptions-instrumentation test) and caused emission failures. Fix that by changing fill value to StringRef. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D158191	2023-08-24 01:29:41 +03:00
Denis Revunov	dfc7599296	[BOLT][Instrumentation] Add test for append-pid option Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D154121	2023-08-23 23:50:32 +03:00
Denis Revunov	a86dd9ae60	[BOLT][Instrumentation] Fix indirect call profile in PIE Because indirect call tables use static addresses for call sites, but pc values recorded by runtime may be subject to ASLR in PIE, we couldn't find indirect call descriptions by their runtime address in PIE. It resulted in [unknown] entries in profile for all indirect calls. We need to substract base address of .text from runtime addresses to get the corresponding static addresses. Here we create a getter for base address of .text and substract it's return value from recorded PC values. It converts them to static addresses, which then may be used to find the corresponding indirect call descriptions. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D154121	2023-08-23 23:50:31 +03:00
Denis Revunov	a799298152	[BOLT][Instrumentation] Keep profile open in WatchProcess When a binary is instrumented with --instrumentation-sleep-time and instrumentation-wait-forks options and lauched, the profile is periodically written until all the forks die. The problem is that we cannot wait for the whole process tree, and we have no way to tell when it's safe to read the profile. Hovewer, if we keep profile open throughout the life of the process tree, we can use fuser to determine when writing is finished. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D154436	2023-08-23 23:50:31 +03:00
zhoujiapeng	62020a3a7e	[BOLT] Implement createRelocation for AArch64 The implementation is based on the X86 version, with the same code of symbol and addend extraction. The differences include the support for RelType `R_AARCH64_CALL26` and the deletion of 8-bit relocation. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D156018	2023-08-23 00:53:32 +08:00
zhoujiapeng	9fee2ac044	[BOLT][NFC] Split createRelocation in X86 and share the second part This commit splits the createRelocation function for the X86 architecture into two parts, retaining the first half and moving the second half to a new function called extractFixupExpr. The purpose of this change is to make extractFixupExpr a shared function between AArch64 and X86 architectures, increasing code reusability and maintainability. Child revision: https://reviews.llvm.org/D156018 Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D157217	2023-08-23 00:29:25 +08:00
Kazu Hirata	ff22d125a7	[BOLT] Fix an unused variable warning This patch fixes: bolt/lib/Core/BinaryFunction.cpp:4117:20: error: unused variable 'FragmentBaseAddress' [-Werror,-Wunused-variable]	2023-08-21 07:57:18 -07:00
Job Noorman	23c8d38258	[BOLT] Calculate input to output address map using BOLTLinker BOLT uses MCAsmLayout to calculate the output values of basic blocks. This means output values are calculated based on a pre-linking state and any changes to symbol values during linking will cause incorrect values to be used. This issue was first addressed in D154604 by adding all basic block symbols to the symbol table for the linker to resolve them. However, the runtime overhead of handling this huge symbol table turned out to be prohibitively large. This patch solves the issue in a different way. First, a temporary section containing [input address, output symbol] pairs is emitted to the intermediary object file. The linker will resolve all these references so we end up with a section of [input address, output address] pairs. This section is then parsed and used to: - Replace BinaryBasicBlock::OffsetTranslationTable - Replace BinaryFunction::InputOffsetToAddressMap - Update BinaryBasicBlock::OutputAddressRange Note that the reason this is more performant than the previous attempt is that these symbol references do not cause entries to be added to the symbol table. Instead, section-relative references are used for the relocations. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D155604	2023-08-21 10:36:20 +02:00
Hans Wennborg	d158ee576b	bolt/test/X86/bug-function-layout-execount.s: Require x86 and asserts Follow-up to D152959: --debug-only= requires an asserts build. The test also needs the x86 target.	2023-08-18 14:02:05 +02:00
hezuoqiang	a37e8a4bdc	[BOLT] Consider Code Fragments during regreassign During register swapping, the code fragments associated with the function need to be swapped together (which may be generated during PGO optimization). Fix https://github.com/llvm/llvm-project/issues/59730 Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D141931	2023-08-18 16:46:18 +08:00
spupyrev	9460ebd130	[BOLT] Fix sorting functions by execution count I noticed that `-reorder-functions=exec-count` doesn't work as expected due to a bug in the comparison function (which isn't symmetric). It is questionable whether anyone would want to ever use the sorting method (as sorting by say density is much better in all cases) but it is probably better to fix the bug. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D152959	2023-08-16 15:08:18 -07:00
Alexander Yermolovich	2c784f7d26	[BOLT][DWARF] Fix handling of invalid DIE references Compiler can generate DIE References that are invalid. Previously BOLT could assert when writing out IR to .debug_info. Changed where DIE offsets are changed so that it's always done. Thus making sure that assert is not triggered. Added more specific warnings, and ability to print out invalid referenced DIE offset when verbosity >=1. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D157746	2023-08-14 17:28:24 -07:00
Alexander Yermolovich	bce5743e21	[BOLT][DWARF] Fix location list order This bug crept in when CU partitioning was introduced. It manifests itself when there are CUs that use location lists that come before CUs that are part of thin-lto. BOLT processes CUs with cross CU references first (these are produced by thin-lto). When we wrote out all the location lists we did it in original order. Since DWARF4 uses offsets directly in to .debug_loc those offsets in DIEs became wrong. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D157908	2023-08-14 17:27:22 -07:00
Kazu Hirata	363be89c7d	[BOLT] Use static_assert (NFC)	2023-08-10 18:44:17 -07:00

1 2 3 4 5 ...

1823 Commits