clang-p2996

Author	SHA1	Message	Date
Alexander Yermolovich	41afc42673	[BOLT][DWARF][NFC] Set initial offset of DIE Setting initial offset of DIE to input DIE. This is to make "printf" debugging easier. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D155031	2023-07-13 10:44:44 -07:00
Kazu Hirata	e71f9d264e	[BOLT] Fix an unused-variable warning This patch fixes: bolt/lib/Core/DIEBuilder.cpp:468:18: error: unused variable 'Ref' [-Werror,-Wunused-variable]	2023-07-10 15:51:58 -07:00
Alexander Yermolovich	dcfa2ab534	[BOLT][DWARF] Change to process and write out TUs first then CUs in batches To reduce memory footprint changed so that we process and write out TUs first, reset DIEBuilder and process CUs. CUs are processed in buckets. First bucket contains all the CUs with cross CU references. Rest processd one at a time. clang-17 build in debug mode, by clang-17. before 8:25.81 real, 834.37 user, 86.03 sys, 0 amem, 79525064 mmem 8:02.20 real, 820.46 user, 81.81 sys, 0 amem, 79501616 mmem 7:52.69 real, 802.01 user, 83.99 sys, 0 amem, 79534392 mmem after 7:49.35 real, 822.04 user, 66.19 sys, 0 amem, 34934260 mmem 7:42.16 real, 825.46 user, 63.52 sys, 0 amem, 34951660 mmem 7:46.71 real, 821.11 user, 63.14 sys, 0 amem, 34981164 mmem Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D151909	2023-07-10 14:42:04 -07:00
Alexander Yermolovich	c33536e9c3	[BOLT][DWARF] Numerous fixes for a new DWARFRewriter * Some cleanup and minor fixes for the new debug information re-writer before moving on to productatization. * The new rewriter wasn't handling binary with DWARF5 and DWARF4 with -fdebug-types-sections. * Removed dead cross cu reference code. * Added support for DW_AT_sibling. * With the new re-writer abbrev number can change which can lead to offset of Type Units changing. Before we would just copy raw data. Changed to write out Type Unit List. This is generated by gdb-add-index. * Fixed how bolt handles gdb-index generated by gdb-11 with types sections. Simplified logic that handles variations of gdb-index. * Clang can generate two type units with the same hash, but different content. LLD does not de-duplicate when ThinLTO is involved. Changed so that TU hash and offset are used to make TU's unique. * It is possible to have references within location expression to another DIE. Fixed it so that relative offset is updated correctly. * Removed all the code related to patching. * Removed dead code. Changed how we handling writting out TUs and TU Index. It now should fully work for DWARF4 and DWARF5. * Removed unused arguments from some APIs, changed return type to void, and other small cleanups. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D151906	2023-07-10 14:42:03 -07:00
Rui Zhong	87fb0ea27e	[BOLT][DWARF] Implement new mechanism for DWARFRewriter This revision implement new mechanism for DWARFRewriter. In the new mechanism, we adopt the same way with DWARFLinker did. By parsing Debug information into IR, we are allowed to handle debug information more flexible. Now the debug information updating process relies on IR and IR will be written out to binary once the updating finished. A new class was added: DIEBuilder. This class is responsible for parsing debug information and raising it to the IR level. This class is also used to write out the .debug_info and .debug_abbrev sections. Since we output brand new Abbrev section we won't need to always convert low_pc/high_pc into ranges. When conversion does happen we can also remove low_pc entry. Reviewed By: maksfb, ayermolo Differential Revision: https://reviews.llvm.org/D130315	2023-07-10 14:42:03 -07:00
Kazu Hirata	544b4c6d52	[BOLT] Fix unused-variable warnings This patch fixes: bolt/lib/Core/DebugData.cpp:1669:20: error: unused variable 'StrOffset' [-Werror,-Wunused-variable] bolt/lib/Core/DebugData.cpp:1676:18: error: unused variable 'NewOffset' [-Werror,-Wunused-variable]	2023-07-09 18:04:50 -07:00
Nico Weber	de7781ea42	Revert "[DWARF][BOLT] Implement new mechanism for DWARFRewriter" This reverts commit `460a224443`. It breaks building on macOS, and it was landed with a review URL pointing to some Facebook-internal service. Also reverts a bunch of follow-ups: Revert "[BOLT][DWARF] Don't check string offsets" This reverts commit `f9d6f48c8b`. Revert "[BOLT][DWARF] Change to process and write out TUs first then CUs in batches" This reverts commit `88e95c1e4b`. Revert "[BOLT][DWARF] Output DWO files as they are being processed" This reverts commit `46ca2e3fcd`. Revert "[BOLT][DWARF] Don't check string offsets" This reverts commit `cfe4a4b04f`. Revert "[BOLT][DWARF] Numerous fixes for a new DWARFRewriter" This reverts commit `2701a661da`.	2023-07-07 08:07:01 -04:00
Alexander Yermolovich	88e95c1e4b	[BOLT][DWARF] Change to process and write out TUs first then CUs in batches Summary: To reduce memory footprint changed so that we process and write out TUs first, reset DIEBuilder and process CUs. CUs are processed in buckets. First bucket contains all the CUs with cross CU references. Rest processd one at a time. clang-17 build in debug mode, by clang-17. before 8:25.81 real, 834.37 user, 86.03 sys, 0 amem, 79525064 mmem 8:02.20 real, 820.46 user, 81.81 sys, 0 amem, 79501616 mmem 7:52.69 real, 802.01 user, 83.99 sys, 0 amem, 79534392 mmem after 7:49.35 real, 822.04 user, 66.19 sys, 0 amem, 34934260 mmem 7:42.16 real, 825.46 user, 63.52 sys, 0 amem, 34951660 mmem 7:46.71 real, 821.11 user, 63.14 sys, 0 amem, 34981164 mmem Differential Revision: https://phabricator.intern.facebook.com/D45883198	2023-07-06 14:21:26 -07:00
Alexander Yermolovich	2701a661da	[BOLT][DWARF] Numerous fixes for a new DWARFRewriter Summary: * Some cleanup and minor fixes for the new debug information re-writer before moving on to productatization. * The new rewriter wasn't handling binary with DWARF5 and DWARF4 with -fdebug-types-sections. * Removed dead cross cu reference code. * Added support for DW_AT_sibling. * With the new re-writer abbrev number can change which can lead to offset of Type Units changing. Before we would just copy raw data. Changed to write out Type Unit List. This is generated by gdb-add-index. * Fixed how bolt handles gdb-index generated by gdb-11 with types sections. Simplified logic that handles variations of gdb-index. * Clang can generate two type units with the same hash, but different content. LLD does not de-duplicate when ThinLTO is involved. Changed so that TU hash and offset are used to make TU's unique. * It is possible to have references within location expression to another DIE. Fixed it so that relative offset is updated correctly. * Removed all the code related to patching. * Removed dead code. Changed how we handling writting out TUs and TU Index. It now should fully work for DWARF4 and DWARF5. * Removed unused arguments from some APIs, changed return type to void, and other small cleanups. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D46168257	2023-07-06 14:21:26 -07:00
Alexander Yermolovich	460a224443	[DWARF][BOLT] Implement new mechanism for DWARFRewriter Summary: This revision implement new mechanism for DWARFRewriter. In the new mechanism, we adopt the same way with DWARFLinker did. By parsing Debug information into IR, we are allowed to handle debug information more flexible. Now the debug information updating process relies on IR and IR will be written out to binary once the updating finished. A new class was added: DIEBuilder. This class is responsible for parsing debug information and raising it to the IR level. This class is also used to write out the .debug_info and .debug_abbrev sections. Since we output brand new Abbrev section we won't need to always convert low_pc/high_pc into ranges. When conversion does happen we can also remove low_pc entry. Differential Revision: https://phabricator.intern.facebook.com/D39484421 Tasks: T117448832	2023-07-06 14:21:26 -07:00
Alexander Yermolovich	66e943b1a9	[BOLT][DWARF] Fix for .debug_line with DWARF5 There was a bug in a code that pre-populated line string for a case where parts of .debug_line are not processed by BOLT, but copied as raw data. We were not switching sections. This resulted in parts of the binary being over-written with debug data. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D154544	2023-07-06 11:37:08 -07:00
Maksim Panchenko	38639a8159	[BOLT][NFCI] Migrate Linux Kernel handling code to MetadataRewriter Create LinuxKernelRewriter and move kernel-specific code to this class. Depends on D154023 Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D154024	2023-07-06 11:25:50 -07:00
Maksim Panchenko	98e2d63027	[BOLT][NFCI] Use MetadataRewriter interface to update SDT markers Migrate SDT markers processing to the new MetadataRewriter interface. Depends on D154020 Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D154021	2023-07-06 11:17:17 -07:00
Amir Ayupov	59a27170c9	[BOLT][NFC] Simplify postProcessJumpTables Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D154115	2023-06-29 22:47:16 -07:00
Job Noorman	b410d24a19	[BOLT][RISCV] Implement R_RISCV_ADD32/SUB32 Thispatch implements the R_RISCV_ADD32 and R_RISCV_SUB32 relocations for RISC-V. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D146554	2023-06-22 09:35:54 +02:00
Kazu Hirata	a132f5eb77	[BOLT] Fix a warning in release builds This patch fixes: bolt/lib/Core/BinarySection.cpp:120:24: error: unused variable 'Relocation' [-Werror,-Wunused-variable]	2023-06-19 20:02:47 -07:00
Job Noorman	b4bb6211a5	[BOLT] Implement composed relocations BOLT currently assumes (and asserts) that no two relocations can share the same offset. Although this is true in most cases, ELF has a feature called (not sure if this is an official term) composed relocations [1] where multiple relocations at the same offset are combined to produce a single value. For example, to support label subtraction (a - b) on RISC-V, two relocations are emitted at the same offset: - R_RISCV_ADD32 a + 0 - R_RISCV_SUB32 b + 0 which, when combined, will produce the value of (a - b). To support this in BOLT, first, RelocationSetType in BinarySection is changed to be a multiset in order to allow it to store multiple relocations at the same offset. Next, Relocation::emit() is changed to receive an iterator pair of relocations. In most cases, these will point to a single relocation in which case its behavior is unaltered by this patch. For composed relocations, they should point to all relocations at the same offset and the following happens: - A new method Relocation::createExpr() is called for every relocation. This method is essentially the same as the original emit() except that it returns the MCExpr without emitting it. - The MCExprs of relocations i and i+1 are combined using the opcode returned by the new method Relocation::getComposeOpcodeFor(). - After combining all MCExprs, the last one is emitted. Note that in the current patch, getComposeOpcodeFor() simply calls llvm_unreachable() since none of the current targets use composed relocations. This will change once the RISC-V target lands. Finally, BinarySection::emitAsData() is updated to group relocations by offset and emit them all at once. Note that this means composed relocations are only supported in data sections. Since this is the only place they seem to be used in RISC-V, I believe it's reasonable to only support them there for now to avoid further code complexity. [1]: https://www.sco.com/developers/gabi/latest/ch4.reloc.html Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D146546	2023-06-19 17:11:08 +02:00
Job Noorman	f873029386	[BOLT] Add minimal RISC-V 64-bit support Just enough features are implemented to process a simple "hello world" executable and produce something that still runs (including libc calls). This was mainly a matter of implementing support for various relocations. Currently, the following are handled: - R_RISCV_JAL - R_RISCV_CALL - R_RISCV_CALL_PLT - R_RISCV_BRANCH - R_RISCV_RVC_BRANCH - R_RISCV_RVC_JUMP - R_RISCV_GOT_HI20 - R_RISCV_PCREL_HI20 - R_RISCV_PCREL_LO12_I - R_RISCV_RELAX - R_RISCV_NONE Executables linked with linker relaxation will probably fail to be processed. BOLT relocates .text to a high address while leaving .plt at its original (low) address. This causes PC-relative PLT calls that were relaxed to a JAL to not fit their offset in an I-immediate anymore. This is something that will be addressed in a later patch. Changes to the BOLT core are relatively minor. Two things were tricky to implement and needed slightly larger changes. I'll explain those below. The R_RISCV_CALL(_PLT) relocation is put on the first instruction of a AUIPC/JALR pair, the second does not get any relocation (unlike other PCREL pairs). This causes issues with the combinations of the way BOLT processes binaries and the RISC-V MC-layer handles relocations: - BOLT reassembles instructions one by one and since the JALR doesn't have a relocation, it simply gets copied without modification; - Even though the MC-layer handles R_RISCV_CALL properly (adjusts both the AUIPC and the JALR), it assumes the immediates of both instructions are 0 (to be able to or-in a new value). This will most likely not be the case for the JALR that got copied over. To handle this difficulty without resorting to RISC-V-specific hacks in the BOLT core, a new binary pass was added that searches for AUIPC/JALR pairs and zeroes-out the immediate of the JALR. A second difficulty was supporting ABS symbols. As far as I can tell, ABS symbols were not handled at all, causing __global_pointer$ to break. RewriteInstance::analyzeRelocation was updated to handle these generically. Tests are provided for all supported relocations. Note that in order to test the correct handling of PLT entries, an ELF file produced by GCC had to be used. While I tried to strip the YAML representation, it's still quite large. Any suggestions on how to improve this would be appreciated. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D145687	2023-06-16 12:19:36 +02:00
Job Noorman	05634f7346	[BOLT] Move from RuntimeDyld to JITLink RuntimeDyld has been deprecated in favor of JITLink. [1] This patch replaces all uses of RuntimeDyld in BOLT with JITLink. Care has been taken to minimize the impact on the code structure in order to ease the inspection of this (rather large) changeset. Since BOLT relied on the RuntimeDyld API in multiple places, this wasn't always possible though and I'll explain the changes in code structure first. Design note: BOLT uses a JIT linker to perform what essentially is static linking. No linked code is ever executed; the result of linking is simply written back to an executable file. For this reason, I restricted myself to the use of the core JITLink library and avoided ORC as much as possible. RuntimeDyld contains methods for loading objects (loadObject) and symbol lookup (getSymbol). Since JITLink doesn't provide a class with a similar interface, the BOLTLinker abstract class was added to implement it. It was added to Core since both the Rewrite and RuntimeLibs libraries make use of it. Wherever a RuntimeDyld object was used before, it was replaced with a BOLTLinker object. There is one major difference between the RuntimeDyld and BOLTLinker interfaces: in JITLink, section allocation and the application of fixups (relocation) happens in a single call (jitlink::link). That is, there is no separate method like finalizeWithMemoryManagerLocking in RuntimeDyld. BOLT used to remap sections between allocating (loadObject) and linking them (finalizeWithMemoryManagerLocking). This doesn't work anymore with JITLink. Instead, BOLTLinker::loadObject accepts a callback that is called before fixups are applied which is used to remap sections. The actual implementation of the BOLTLinker interface lives in the JITLinkLinker class in the Rewrite library. It's the only part of the BOLT code that should directly interact with the JITLink API. For loading object, JITLinkLinker first creates a LinkGraph (jitlink::createLinkGraphFromObject) and then links it (jitlink::link). For the latter, it uses a custom JITLinkContext with the following properties: - Use BOLT's ExecutableFileMemoryManager. This one was updated to implement the JITLinkMemoryManager interface. Since BOLT never executes code, its finalization step is a no-op. - Pass config: don't use the default target passes since they modify DWARF sections in a way that seems incompatible with BOLT. Also run a custom pre-prune pass that makes sure sections without symbols are not pruned by JITLink. - Implement symbol lookup. This used to be implemented by BOLTSymbolResolver. - Call the section mapper callback before the final linking step. - Copy symbol values when the LinkGraph is resolved. Symbols are stored inside JITLinkLinker to ensure that later objects (i.e., instrumentation libraries) can find them. This functionality used to be provided by RuntimeDyld but I did not find a way to use JITLink directly for this. Some more minor points of interest: - BinarySection::SectionID: JITLink doesn't have something equivalent to RuntimeDyld's Section IDs. Instead, sections can only be referred to by name. Hence, SectionID was updated to a string. - There seem to be no tests for Mach-O. I've tested a small hello-world style binary but not more than that. - On Mach-O, JITLink "normalizes" section names to include the segment name. I had to parse the section name back from this manually which feels slightly hacky. [1] https://reviews.llvm.org/D145686#4222642 Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D147544	2023-06-15 11:13:52 +02:00
Maksim Panchenko	5c4d306a10	[BOLT][NFC] Change signature of MCPlusBuilder::isUnsupportedBranch() Make MCPlusBuilder::isUnsupportedBranch() take MCInst, not opcode. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D152765	2023-06-13 12:20:36 -07:00
Maksim Panchenko	43f56a2f27	[BOLT] Fix handling of code references from unmodified code In lite mode (default for X86), BOLT optimizes and relocates functions with profile. The rest of the code is preserved, but if it references relocated code such references have to be updated. The update is handled by scanExternalRefs() function. Note that we cannot solely rely on relocations written by the linker, as not all code references are exposed to the linker. Additionally, the linker can modify certain instructions and relocations will no longer match the code. With this change, start using symbolic disassembler for scanning code for references in scanExternalRefs(). Unlike the previous approach, the symbolizer properly detects and creates references for instructions with multiple/ambiguous symbolic operands and handles cases where a relocation doesn't match any operand. See test cases for examples. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D152631	2023-06-12 10:46:51 -07:00
Amir Ayupov	702fe36b70	[BOLT][NFC] Const-ify getDynamicRelocationAt Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D152662	2023-06-12 09:55:16 -07:00
spupyrev	2316a10fe5	[BOLT] stale profile matching [part 2 out of 2] This is a first "serious" version of stale profile matching in BOLT. This diff extends the hash computation for basic blocks so that we can apply a fuzzy hash-based matching. The idea is to compute several "versions" of a hash value for a basic block. A loose version of a hash (computed by ignoring instruction operands) allows to match blocks in functions whose content has been changed, while stricter hash values (considering instruction opcodes with operands and even based on hashes of block's successors/predecessors) allow to resolve collisions. In order to save space and build time, individual hash components are blended into a single uint64_t. There are likely numerous ways of improving hash computation but already this simple variant provides significant perf benefits. Perf testing on the clang binary: collecting data on clang-10 and using it to optimize clang-11 (with ~1 year of commits in between). Next, we compare - //stale_clang// (clang-11 optimized with profile collected on clang-10 with infer-stale-profile=0) - //opt_clang// (clang-11 optimized with profile collected on clang-11) - //infer_clang// (clang-11 optimized with profile collected on clang-10 with infer-stale-profile=1) `LTO-only` mode: //stale_clang// vs //opt_clang//: task-clock [delta(%): 9.4252 ± 1.6582, p-value: 0.000002] (That is, there is a ~9.5% perf regression) //infer_clang// vs //opt_clang//: task-clock [delta(%): 2.1834 ± 1.8158, p-value: 0.040702] (That is, the regression is reduced to ~2%) Related BOLT logs: ``` BOLT-INFO: identified 2114 (18.61%) stale functions responsible for 30.96% samples BOLT-INFO: inferred profile for 2101 (18.52% of all profiled) functions responsible for 30.95% samples ``` `LTO+AutoFDO` mode: //stale_clang// vs //opt_clang//: task-clock [delta(%): 19.1293 ± 1.4131, p-value: 0.000002] //infer_clang// vs //opt_clang//: task-clock [delta(%): 7.4364 ± 1.3343, p-value: 0.000002] Related BOLT logs: ``` BOLT-INFO: identified 5452 (50.27%) stale functions responsible for 85.34% samples BOLT-INFO: inferred profile for 5442 (50.23% of all profiled) functions responsible for 85.33% samples ``` Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D146661	2023-06-08 14:42:41 -07:00
Amir Ayupov	713b28532e	[BOLT][NFC] Fix debug messages Fix debug printing, making it easier to compare two debug logs side by side: - `BinaryFunction::addRelocation`: print function name instead of `this` ptr, - `DataAggregator::doTrace`: remove duplicated function name. Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D152314	2023-06-06 15:50:58 -07:00
Sergei Barannikov	ee1d5f6372	[MC] Check if register is non-null before calling isSubRegisterEq (NFCI) D151036 adds an assertions that prohibits iterating over sub- and super-registers of a null register. This is already the case when iterating over register units of a null register, and worked by accident for sub- and super-registers. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D151285	2023-05-25 08:53:15 +03:00
Amir Ayupov	068e9889b1	[BOLT] Add isParentOf and isParentOrChildOf BF checks Add helper methods and simplify cases where we want to check if two functions are parent-child of each other (function-fragment relationship). Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D142668	2023-05-19 17:51:54 -07:00
Amir Ayupov	b6f07d3ae8	[BOLT][NFC] Add MCPlusBuilder defOperands/useOperands helpers Make intent more explicit with the use of new helper methods. Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D150810	2023-05-17 21:52:33 -07:00
Rafael Auler	77811752e3	[BOLT] Fix flush pending relocs https://github.com/facebookincubator/BOLT/pull/255 accidentally omitted a relocation type when refactoring the code. Add this type back and change function name so its intent is more clear. Reviewed By: #bolt, Amir Differential Revision: https://reviews.llvm.org/D150335	2023-05-11 11:52:32 -07:00
Amir Ayupov	6fcb91b2f7	[BOLT] Use opcode name in hashBlock Use MCInst opcode name instead of opcode value in hashing. Opcode values are unstable wrt changes to target tablegen definitions, and we notice that as output mismatches in NFC testing. This makes BOLT YAML profile tied to a particular LLVM revision which is less portable than offset-based fdata profile. Switch to using opcode names which have 1:1 mapping with opcode values for any given LLVM revision, and are stable wrt modifications to .td files (except of course modifications to names themselves). Test Plan: D150154 is a test commit adding new X86 instruction which shifts opcode values. With current change, pre-aggregated-perf.test passes in nfc check mode. Without current change, pre-aggregated-perf.test expectedly fails. Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D150005	2023-05-08 18:54:29 -07:00
Alexander Yermolovich	93ce096502	[BOLT][DWARF] Fix handling of loclists_base without location accesses There are CUs that have DW_AT_loclists_base, but no DW_AT_location in children DIEs. Pre-bolt it points to a valid offset. We were not updating it, so it ended up pointing in the middle of a list and caused LLDB to print out errors. Changed it to point to first location list. I don't think it should matter since there are no accesses to it anyway. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D149798	2023-05-03 20:50:37 -07:00
spupyrev	3e3a926be8	[BOLT][NFC] Add hash computation for basic blocks Extending yaml profile format with block hashes, which are used for stale profile matching. To avoid duplication of the code, created a new class with a collection of utilities for computing hashes. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D144306	2023-05-02 14:03:47 -07:00
Job Noorman	8bbfac7be1	[BOLT][NFC] Fix UB due to unaligned load in DebugStrOffsetsWriter The following tests fail when enabling UBSan due to an unaligned memory load: > runtime error: load of misaligned address 0x620000000643 for type > 'const uint32_t' (aka 'const unsigned int'), which requires 4 byte > alignment BOLT :: AArch64/asm-func-debug.test BOLT :: AArch64/update-debug-reloc.test BOLT :: X86/asm-func-debug.test BOLT :: X86/dwarf5-df-dualcu.test BOLT :: X86/dwarf5-df-mono-dualcu.test BOLT :: X86/dwarf5-ftypes-dwp-input-dwo-output.test BOLT :: X86/dwarf5-locaddrx.test BOLT :: X86/dwarf5-split-dwarf4-monolithic.test BOLT :: X86/inlined-function-mixed.test BOLT :: non-empty-debug-line.test This patch fixes this by using read32le for the load. Reviewed By: ayermolo Differential Revision: https://reviews.llvm.org/D148217	2023-04-13 16:39:22 +02:00
Rafael Auler	b87bf74428	[BOLT] Fix creation of invalid CFG in presence of dead code When there is a direct jump right after an indirect one, in the absence of code jumpting to this direct jump, this is obviously dead code. However, BOLT was failing to recognize that by mistakenly placing both jmp instructions in the same basic block, and creating wrong successor edges. Fix that, so we can safely run UCE on that. This bug also causes validateCFG to fail and BOLT to crash if it is running ICP on that function. Reviewed By: #bolt, Amir Differential Revision: https://reviews.llvm.org/D148055	2023-04-11 17:19:39 -07:00
Alexis Engelke	0c049ea60a	[MC] Always encode instruction into SmallVector All users of MCCodeEmitter::encodeInstruction use a raw_svector_ostream to encode the instruction into a SmallVector. The raw_ostream however incurs some overhead for the actual encoding. This change allows an MCCodeEmitter to directly emit an instruction into a SmallVector without using a raw_ostream and therefore allow for performance improvments in encoding. A default path that uses existing raw_ostream implementations is provided. Reviewed By: MaskRay, Amir Differential Revision: https://reviews.llvm.org/D145791	2023-04-06 16:21:49 +02:00
spupyrev	92758a99c3	[BOLT] computing raw branch count for yaml profiles `Function.RawBranchCount` is initialized for fdata profile but not for yaml one. The diff adds the computation of the field for yaml profiles Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D144211	2023-03-28 11:09:21 -07:00
Denis Revunov	22a4aaf2b0	[BOLT] Don't use section relocations when computing hash for data from other section When computing symbol hashes in BinarySection::hash, we try to find relocations in the section which reference the passed BinaryData. We do so by doing lower_bound on data begin offset and upper_bound on data end offset. Since offsets are relative to the current section, if it is a data from the previous section, we get underflow when computing offset and lower_bound returns Relocations.end(). If this data also ends where current section begins, upper_bound on zero offset will return some valid iterator if we have any relocations after the first byte. Then we'll try to iterate from lower_bound to upper_bound, since they're not equal, which in that case means we'll dereference Relocations.end(), increment it, and try to do so until we reach the second valid iterator. Of course we reach segfault earlier. In this patch we stop BOLT from searching relocations for symbols outside of the current section. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D146620	2023-03-24 21:59:50 +03:00
Vladislav Khmelevsky	f9bf9f925e	[BOLT] Add .relr.dyn section support Vladislav Khmelevsky, Advanced Software Technology Lab, Huawei Differential Revision: https://reviews.llvm.org/D146085	2023-03-17 17:24:19 +04:00
Kazu Hirata	4e585e51c1	Use *{Map,Set}::contains (NFC)	2023-03-15 22:55:35 -07:00
Amir Ayupov	edda85771a	[BOLT][NFC] Move addRelocation{X86,AArch64} into MCPlusBuilder The two methods don't belong in BinaryFunction methods. Move the dispatch tables into target-specific MCPlusBuilder methods. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D131813	2023-03-14 17:34:25 -07:00
Amir Ayupov	ce1061074d	[BOLT][NFC] Simplify MCPlusBuilder::getRegSize Pre-calculate the register size table in MCPlusBuilder constructor, similar to `AliasMap`/`SmallerAliasMap` in `initAliases`. Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D145828	2023-03-14 17:26:36 -07:00
Amir Ayupov	2eae9d8eb2	[BOLT][NFC] Use llvm::is_contained Apply the replacement throughout BOLT. Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D145464	2023-03-14 15:37:03 -07:00
Amir Ayupov	16e67e6932	[BOLT][NFC] Remove BB::getBranchInfo accepting MCSymbol ptr Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D144924	2023-03-14 15:35:05 -07:00
Job Noorman	4875e06709	[BOLT][NFC] Improve performance of MCPlusBuilder::initAliases It was using a redundant iteration over super regs to build SmallerAliasMap. Removing this results in exactly the same alias maps and a noticeable performance gain on targets with a large number of registers. Just anecdotally: on my machine, processing a small AArch64 binary went from 2.7s down to 80ms. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D145779	2023-03-13 11:51:12 -07:00
Vladislav Khmelevsky	7117af529e	[BOLT] Improve dynamic relocations support for CI This patch fixes few problems with supporting dynamic relocations in CI. 1. After dynamic relocations and functions were read search for dynamic relocations located in functions. Currently we expected them only to be relative and only to be in constant island. Mark islands of such functions to have dynamic relocations and create CI access symbol on the relocation offset, so the BD would be created for such place. 2. During function disassemble and handling address reference for constant island check if the referred external CI has dynamic relocation. And if it has one we would continue to refer original CI rather then creating a local copy. 3. After function disassembly stage mark function that has dynamic reloc in CI as non-simple. We don't want such functions to be optimized, since such passes as split function would create 2 copies of CI which we unable to support currently. 4. During updating output values for BF search for BD located in CI and update their output locations. 5. On dynamic relocation patching stage search for binary data located on relocation offset. If it was moved use new relocation offset value rather then an old one. Vladislav Khmelevsky, Advanced Software Technology Lab, Huawei Differential Revision: https://reviews.llvm.org/D143748	2023-03-13 13:37:28 +04:00
Amir Ayupov	d77f96a9e0	[BOLT][NFC] Simplify BinaryFunction::setTrapOnEntry Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D144758	2023-02-27 15:41:40 -08:00
Amir Ayupov	1c286acfb8	[BOLT] Prevent unsetting unknown control flow for split jump table In case of a function with unknown control flow but with a single jump table and a single jump table site, we attempt to match the jump table and a site and update block successors using jump table targets. Restrict this behavior for split jump tables which have targets in a fragment function. Fixes https://github.com/llvm/llvm-project/issues/60795. Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D144602	2023-02-27 15:22:43 -08:00
Amir Ayupov	08ab4faf1a	[BOLT][NFC] Const-ify analyzeJumpTable Avoid modifying `BF`, instead set extra output parameter and modify BF in caller scope. Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D144598	2023-02-27 15:22:35 -08:00
Maksim Panchenko	03e94f6608	[BOLT] Change call count output for ICF ICF optimization runs multiple passes and the order in which functions are folded could be dependent on the order they are being processed. This order is indeterministic as functions are intermediately stored in std::unordered_map<>. Note that this order is mostly stable, but is not guaranteed to be and can change e.g. after switching to a different C++ library implementation. Because the processing (and folding) order is indeterministic, the previous way of calculating merged function call count could produce different results. Change the way we calculate the ICF call count to make it independent of the function folding/processing order. Mostly NFC as the output binary should remain the same, the change affects only the console output. Reviewed By: yota9 Differential Revision: https://reviews.llvm.org/D144807	2023-02-27 15:21:16 -08:00
Maksim Panchenko	fb28196a64	[BOLT] Fix intermittent crash with instrumentation When createInstrumentedIndirectCall() was invoked for tail calls, we attached annotation instruction twice to the new call instruction. First in createDirectCall(), and then again while copying over the metadata operands. As a result, the annotations were not properly stripped for such calls before the call to freeAnnotations() in LowerAnnotations pass. That lead to use-after-free while restoring the offsets with setOffset() call. Reviewed By: yota9 Differential Revision: https://reviews.llvm.org/D144806	2023-02-27 14:11:10 -08:00
Vladislav Khmelevsky	b0d1f87b59	[BOLT] Fix data reoder for aarch64 Use proper relocation for aarch64 Differential Revision: https://reviews.llvm.org/D144095	2023-02-22 16:58:43 +04:00

1 2 3 4 5

244 Commits