clang-p2996

Author	SHA1	Message	Date
Maksim Panchenko	138e2abfeb	[BOLT] Attach ORC info to instructions in CFG Propagate Linux Kernel ORC information read from the file to the whole function CFG once the graph has been built. We have a choice to either attach ORC state annotation to every instruction, or to the first instruction in the basic block to conserve processing memory. I chose to attach to every instruction under --print-orc option which is currently on by default. Depends on D155153, D154815 Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D155156	2023-07-13 11:12:54 -07:00
Maksim Panchenko	dd630d831c	[BOLT][NFC] Add post-CFG processing to MetadataRewriter interface Add MetadataRewriter::postCFGInitializer(). Reviewed By: jobnoorman Differential Revision: https://reviews.llvm.org/D155153	2023-07-13 11:09:57 -07:00
Maksim Panchenko	e6724cbd8a	[BOLT] Add reading support for Linux ORC sections Read ORC (oops rewind capability) info used for unwinding the stack by Linux Kernel. The info is stored in .orc_unwind and .orc_unwind_ip sections. There is also a related .orc_lookup section that is being populated by the kernel during runtime. Contents of the sections are sorted for quicker lookup by a post-link objtool. Unless we modify stack access instructions, we don't have to change ORC info attributed to instructions in the binary. However, we need to update instruction addresses and sort both sections based on the new layout. For pretty printing, we add "--print-orc" option that prints ORC info next to instructions in code dumps. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D154815	2023-07-13 11:07:29 -07:00
Alexander Yermolovich	790b75ea36	[BOLT][DWARF] Fix adding DW_AT_GNU_ranges_base There are cases in DWARF4 when Skeleton CU has ranges, but dwo CU doesn't. Bug was introduced in new DWARFRewriter where for DWARF4 it would fall through to DWARF5 case. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D155033	2023-07-13 10:54:48 -07:00
Alexander Yermolovich	4940128ace	[BOLT][DWARF][NFC] Fix false positive error The DWO Unit DIE, doesn't have low_pc/high_pc, so we were printing this error for valid cases. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D155032	2023-07-13 10:50:19 -07:00
Alexander Yermolovich	41afc42673	[BOLT][DWARF][NFC] Set initial offset of DIE Setting initial offset of DIE to input DIE. This is to make "printf" debugging easier. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D155031	2023-07-13 10:44:44 -07:00
Maksim Panchenko	7b72920af6	[BOLT] Fix warning message Add missing EOL in a warning message. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D154895	2023-07-11 01:00:08 -07:00
Job Noorman	f2f1e670b0	[BOLT] Make sure temp object file is always written BOLT used `ToolOutputFile::keep` to make sure the intermediary object file was written to disk for debugging purposes when `--keep-tmp` was passed. However, since and intermediary `buffer_ostream` was used to stream to, and this class only writes to its output stream in its destructor, the object file was lost whenever its destructor wouldn't run. This could happen, for example, if there is a crash while linking. This patch makes sure the object file is written to disk immediately after we're done creating it. This is very useful while debugging JITLink crashes. This patch also gets rid of creating a temporary file when `--keep-tmp` is not passed by streaming the object file directly to a `SmallString`. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D154826	2023-07-11 09:35:36 +02:00
Kazu Hirata	e71f9d264e	[BOLT] Fix an unused-variable warning This patch fixes: bolt/lib/Core/DIEBuilder.cpp:468:18: error: unused variable 'Ref' [-Werror,-Wunused-variable]	2023-07-10 15:51:58 -07:00
Alexander Yermolovich	dcfa2ab534	[BOLT][DWARF] Change to process and write out TUs first then CUs in batches To reduce memory footprint changed so that we process and write out TUs first, reset DIEBuilder and process CUs. CUs are processed in buckets. First bucket contains all the CUs with cross CU references. Rest processd one at a time. clang-17 build in debug mode, by clang-17. before 8:25.81 real, 834.37 user, 86.03 sys, 0 amem, 79525064 mmem 8:02.20 real, 820.46 user, 81.81 sys, 0 amem, 79501616 mmem 7:52.69 real, 802.01 user, 83.99 sys, 0 amem, 79534392 mmem after 7:49.35 real, 822.04 user, 66.19 sys, 0 amem, 34934260 mmem 7:42.16 real, 825.46 user, 63.52 sys, 0 amem, 34951660 mmem 7:46.71 real, 821.11 user, 63.14 sys, 0 amem, 34981164 mmem Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D151909	2023-07-10 14:42:04 -07:00
Alexander Yermolovich	8362418748	[BOLT][DWARF] Output DWO files as they are being processed Changed how we handle writing out .dwo and .dwp files. We now write out DWO sections sooner and destroy DIEBuilder. This should decrease memory footprint. Ran on clang-17 build in debug mode with split-dwarf. before 8:07.49 real, 664.62 user, 69.00 sys, 0 amem, 41601612 mmem 8:07.06 real, 669.60 user, 68.75 sys, 0 amem, 41822588 mmem 8:00.36 real, 664.14 user, 66.36 sys, 0 amem, 41561548 mmem after 8:21.85 real, 682.23 user, 69.64 sys, 0 amem, 39379880 mmem 8:04.58 real, 671.62 user, 66.50 sys, 0 amem, 39735800 mmem 8:10.02 real, 680.67 user, 67.24 sys, 0 amem, 39662888 mmem Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D151908	2023-07-10 14:42:04 -07:00
Alexander Yermolovich	c33536e9c3	[BOLT][DWARF] Numerous fixes for a new DWARFRewriter * Some cleanup and minor fixes for the new debug information re-writer before moving on to productatization. * The new rewriter wasn't handling binary with DWARF5 and DWARF4 with -fdebug-types-sections. * Removed dead cross cu reference code. * Added support for DW_AT_sibling. * With the new re-writer abbrev number can change which can lead to offset of Type Units changing. Before we would just copy raw data. Changed to write out Type Unit List. This is generated by gdb-add-index. * Fixed how bolt handles gdb-index generated by gdb-11 with types sections. Simplified logic that handles variations of gdb-index. * Clang can generate two type units with the same hash, but different content. LLD does not de-duplicate when ThinLTO is involved. Changed so that TU hash and offset are used to make TU's unique. * It is possible to have references within location expression to another DIE. Fixed it so that relative offset is updated correctly. * Removed all the code related to patching. * Removed dead code. Changed how we handling writting out TUs and TU Index. It now should fully work for DWARF4 and DWARF5. * Removed unused arguments from some APIs, changed return type to void, and other small cleanups. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D151906	2023-07-10 14:42:03 -07:00
Rui Zhong	87fb0ea27e	[BOLT][DWARF] Implement new mechanism for DWARFRewriter This revision implement new mechanism for DWARFRewriter. In the new mechanism, we adopt the same way with DWARFLinker did. By parsing Debug information into IR, we are allowed to handle debug information more flexible. Now the debug information updating process relies on IR and IR will be written out to binary once the updating finished. A new class was added: DIEBuilder. This class is responsible for parsing debug information and raising it to the IR level. This class is also used to write out the .debug_info and .debug_abbrev sections. Since we output brand new Abbrev section we won't need to always convert low_pc/high_pc into ranges. When conversion does happen we can also remove low_pc entry. Reviewed By: maksfb, ayermolo Differential Revision: https://reviews.llvm.org/D130315	2023-07-10 14:42:03 -07:00
Kazu Hirata	544b4c6d52	[BOLT] Fix unused-variable warnings This patch fixes: bolt/lib/Core/DebugData.cpp:1669:20: error: unused variable 'StrOffset' [-Werror,-Wunused-variable] bolt/lib/Core/DebugData.cpp:1676:18: error: unused variable 'NewOffset' [-Werror,-Wunused-variable]	2023-07-09 18:04:50 -07:00
Nico Weber	de7781ea42	Revert "[DWARF][BOLT] Implement new mechanism for DWARFRewriter" This reverts commit `460a224443`. It breaks building on macOS, and it was landed with a review URL pointing to some Facebook-internal service. Also reverts a bunch of follow-ups: Revert "[BOLT][DWARF] Don't check string offsets" This reverts commit `f9d6f48c8b`. Revert "[BOLT][DWARF] Change to process and write out TUs first then CUs in batches" This reverts commit `88e95c1e4b`. Revert "[BOLT][DWARF] Output DWO files as they are being processed" This reverts commit `46ca2e3fcd`. Revert "[BOLT][DWARF] Don't check string offsets" This reverts commit `cfe4a4b04f`. Revert "[BOLT][DWARF] Numerous fixes for a new DWARFRewriter" This reverts commit `2701a661da`.	2023-07-07 08:07:01 -04:00
Alexander Yermolovich	88e95c1e4b	[BOLT][DWARF] Change to process and write out TUs first then CUs in batches Summary: To reduce memory footprint changed so that we process and write out TUs first, reset DIEBuilder and process CUs. CUs are processed in buckets. First bucket contains all the CUs with cross CU references. Rest processd one at a time. clang-17 build in debug mode, by clang-17. before 8:25.81 real, 834.37 user, 86.03 sys, 0 amem, 79525064 mmem 8:02.20 real, 820.46 user, 81.81 sys, 0 amem, 79501616 mmem 7:52.69 real, 802.01 user, 83.99 sys, 0 amem, 79534392 mmem after 7:49.35 real, 822.04 user, 66.19 sys, 0 amem, 34934260 mmem 7:42.16 real, 825.46 user, 63.52 sys, 0 amem, 34951660 mmem 7:46.71 real, 821.11 user, 63.14 sys, 0 amem, 34981164 mmem Differential Revision: https://phabricator.intern.facebook.com/D45883198	2023-07-06 14:21:26 -07:00
Alexander Yermolovich	46ca2e3fcd	[BOLT][DWARF] Output DWO files as they are being processed Summary: Changed how we handle writing out .dwo and .dwp files. We now write out DWO sections sooner and destroy DIEBuilder. This should decrease memory footprint. Ran on clang-17 build in debug mode with split-dwarf. before 8:07.49 real, 664.62 user, 69.00 sys, 0 amem, 41601612 mmem 8:07.06 real, 669.60 user, 68.75 sys, 0 amem, 41822588 mmem 8:00.36 real, 664.14 user, 66.36 sys, 0 amem, 41561548 mmem after 8:21.85 real, 682.23 user, 69.64 sys, 0 amem, 39379880 mmem 8:04.58 real, 671.62 user, 66.50 sys, 0 amem, 39735800 mmem 8:10.02 real, 680.67 user, 67.24 sys, 0 amem, 39662888 mmem Differential Revision: https://phabricator.intern.facebook.com/D45458889	2023-07-06 14:21:26 -07:00
Alexander Yermolovich	2701a661da	[BOLT][DWARF] Numerous fixes for a new DWARFRewriter Summary: * Some cleanup and minor fixes for the new debug information re-writer before moving on to productatization. * The new rewriter wasn't handling binary with DWARF5 and DWARF4 with -fdebug-types-sections. * Removed dead cross cu reference code. * Added support for DW_AT_sibling. * With the new re-writer abbrev number can change which can lead to offset of Type Units changing. Before we would just copy raw data. Changed to write out Type Unit List. This is generated by gdb-add-index. * Fixed how bolt handles gdb-index generated by gdb-11 with types sections. Simplified logic that handles variations of gdb-index. * Clang can generate two type units with the same hash, but different content. LLD does not de-duplicate when ThinLTO is involved. Changed so that TU hash and offset are used to make TU's unique. * It is possible to have references within location expression to another DIE. Fixed it so that relative offset is updated correctly. * Removed all the code related to patching. * Removed dead code. Changed how we handling writting out TUs and TU Index. It now should fully work for DWARF4 and DWARF5. * Removed unused arguments from some APIs, changed return type to void, and other small cleanups. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D46168257	2023-07-06 14:21:26 -07:00
Alexander Yermolovich	460a224443	[DWARF][BOLT] Implement new mechanism for DWARFRewriter Summary: This revision implement new mechanism for DWARFRewriter. In the new mechanism, we adopt the same way with DWARFLinker did. By parsing Debug information into IR, we are allowed to handle debug information more flexible. Now the debug information updating process relies on IR and IR will be written out to binary once the updating finished. A new class was added: DIEBuilder. This class is responsible for parsing debug information and raising it to the IR level. This class is also used to write out the .debug_info and .debug_abbrev sections. Since we output brand new Abbrev section we won't need to always convert low_pc/high_pc into ranges. When conversion does happen we can also remove low_pc entry. Differential Revision: https://phabricator.intern.facebook.com/D39484421 Tasks: T117448832	2023-07-06 14:21:26 -07:00
Alexander Yermolovich	66e943b1a9	[BOLT][DWARF] Fix for .debug_line with DWARF5 There was a bug in a code that pre-populated line string for a case where parts of .debug_line are not processed by BOLT, but copied as raw data. We were not switching sections. This resulted in parts of the binary being over-written with debug data. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D154544	2023-07-06 11:37:08 -07:00
Maksim Panchenko	38639a8159	[BOLT][NFCI] Migrate Linux Kernel handling code to MetadataRewriter Create LinuxKernelRewriter and move kernel-specific code to this class. Depends on D154023 Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D154024	2023-07-06 11:25:50 -07:00
Maksim Panchenko	43dce27c06	[BOLT][NFCI] Migrate pseudo probes to MetadataRewriter interface Use new MetdataRewriter interface to update pseudo probes and move ProbeDecoder out of BinaryContext into new PseudoProbeRewriter class. Depends on D154021 Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D154022 Differential Revision: https://reviews.llvm.org/D154023	2023-07-06 11:19:30 -07:00
Maksim Panchenko	98e2d63027	[BOLT][NFCI] Use MetadataRewriter interface to update SDT markers Migrate SDT markers processing to the new MetadataRewriter interface. Depends on D154020 Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D154021	2023-07-06 11:17:17 -07:00
Maksim Panchenko	c9b1f06288	[BOLT] Introduce MetadataRewriter interface Introduce the MetadataRewriter interface to handle updates for various types of auxiliary data stored in a binary file. To implement metadata processing using this new interface, all metadata rewriters should derive from the RewriterBase class and implement one or more of the following methods, depending on the timing of metadata read and write operations: * preCFGInitializer() * postCFGInitializer() // TBD * preEmitFinalizer() // TBD * postEmitFinalizer() By adopting this approach, we aim to simplify the RewriteInstance class and improve its scalability to accommodate new extensions of file formats, including various metadata types of the Linux Kernel. Differential Revision: https://reviews.llvm.org/D154020	2023-07-06 11:09:51 -07:00
Amir Ayupov	59a27170c9	[BOLT][NFC] Simplify postProcessJumpTables Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D154115	2023-06-29 22:47:16 -07:00
Denis Revunov	f6682ad03f	[BOLT][Instrumentation] Disallow combining append-pid with sleep-time/wait-forks The point of instrumentation-sleep-time option is to have a watcher process which shares memory with all other forks and dumps a common profile each n seconds. Combining it with append-pid suggests that we should get a private profile of each fork every n seconds, but such behavior is not implemented currently and is not easy to implement in general, because we somehow need to intercept each individual fork, launch a watcher process just for that fork, and also map counters so that they're only shared with that single fork. Since we're not doing it, we just disallow such combination of options. Reviewed By: rafauler, Amir Differential Revision: https://reviews.llvm.org/D153771	2023-06-30 01:03:53 +03:00
Amir Ayupov	2f3f7d1206	[BOLT] Add -dump-cg option to dump call graph Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D153994	2023-06-28 17:54:24 -07:00
Amir Ayupov	3fe2c21872	[BOLT][NFC] Add extra debug logging to buildCallGraph Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D153987	2023-06-28 17:52:59 -07:00
Amir Ayupov	fd49cc87d0	[BOLT][NFC] Print functions after attaching profile (-print-profile) Add an extra point of dumping functions: immediately after attaching the profile information. This dumping is enabled by newly introduced `-print-profile` and `-print-all`. The reason is that in `aggregate-only`/perf2bolt mode BOLT may not reach the point of printing the function after CFG is constructed (`-print-cfg`), while we may still want to inspect the attached profile, especially for diff'ing purposes. Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D153996	2023-06-28 17:51:17 -07:00
Shatian Wang	a89c9b35be	[BOLT] Fixing relative ordering of cold sections under multi-way function splitting Order code sections with names in the form of ".text.cold.i" based on the value of i [Context] SplitFunctions.cpp implements splitting strategies that can potentially split each function into maximum N>2 fragments. When such N-way splitting happens, new code sections with names ".text.cold.1", ..., ".text.cold.i", ... "text.cold.N-2" will be created A section with name ".text.cold.i" contains the the (i+2)th fragment of each function. As an example, if each function is splitted into N=3 fragments: hot, warm, cold, then code sections will now include - a section with name ".text" containing hot fragments - a section with name ".text.cold" containing warm fragments - a section with name ".text.cold.1" containing cold fragments The order of these new sections in the output binary currently depends on the order in which they are encountered by the emitter. For example, under N=3-way splitting, if the first function is 2-way splitted into hot and cold and the second function is 3-way splitted into hot, warm, and cold then the cold fragment is encountered first, resulting in the final section to be in the following order .text (hot), .text.cold.1 (cold), .text.cold (warm) The above is suboptimal because the distance of jumps/calls between the hot and the warm sections will be much bigger than when ordering the sections as follows .text (hot), .text.cold (warm), .text.cold.1 (cold) This diff orders the sections with names in the form of ".text.cold" or ".text.cold.i" based on the value of i (assuming the i-value of ".text.cold" is 0). Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D152941	2023-06-22 14:26:48 -07:00
Maksim Panchenko	deb53102a7	[BOLT] Remove unnecessary diagnostics When optimizations passes do not change anything, skip their diagnostics output. NFC otherwise. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D153386	2023-06-22 14:07:00 -07:00
Job Noorman	38ba2824c8	[BOLT] Don't register internal func relocs as external references Currently, all relocations that point inside a function are registered as external references. If these relocations cannot be resolved as jump tables or computed gotos, the containing function gets marked as not-simple and excluded from optimizations. RISC-V uses relocations for branches and jumps (to support linker relaxation) and as such, almost no functions get marked as simple. This patch fixes this by only registering relocations that originate outside of the referenced function as external references. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D153345	2023-06-22 09:35:54 +02:00
Job Noorman	b410d24a19	[BOLT][RISCV] Implement R_RISCV_ADD32/SUB32 Thispatch implements the R_RISCV_ADD32 and R_RISCV_SUB32 relocations for RISC-V. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D146554	2023-06-22 09:35:54 +02:00
Job Noorman	b6556dc9fe	[BOLT][RISCV] Fix implementation of getTargetSymbol - Correctly handle OpNum == 0 (auto select operand) - Implement MCExpr overload Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D153343	2023-06-21 10:21:00 +02:00
Job Noorman	41b8aed499	[BOLT][RISCV] Implement branch reversal Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D153344	2023-06-21 10:21:00 +02:00
Job Noorman	5e67ae151e	[BOLT][RISCV] Implement return/unconditional branch creation Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D153342	2023-06-21 10:21:00 +02:00
Amir Ayupov	82ef86c194	[BOLT] Set IsRelro section attribute based on PT_GNU_RELRO segment Handle PT_GNU_RELRO segment in accordance with Linux Standard Base spec chapter 12: > PT_GNU_RELRO > The array element specifies the location and size of a segment which may > be made read-only after relocations have been processed. Perform a readelf-style mapping check between this segment and sections, set `IsRelro` section attribute. Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D152944	2023-06-20 20:44:18 -07:00
Amir Ayupov	6c87315518	[BOLT] Sort CallSiteInfo targets by symbol name in YAMLWriter Align YAML and fdata profiles by sorting CallSiteInfo targets by symbol name, aligning it to fdata. By default, YAML CallSiteInfo is sorted by function id, which is the order of function in the binary. Follow-up to D152731, aligning yaml vs fdata, and in turn all three between to each other. Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D152733	2023-06-20 15:20:24 -07:00
Kazu Hirata	a132f5eb77	[BOLT] Fix a warning in release builds This patch fixes: bolt/lib/Core/BinarySection.cpp:120:24: error: unused variable 'Relocation' [-Werror,-Wunused-variable]	2023-06-19 20:02:47 -07:00
Job Noorman	b4bb6211a5	[BOLT] Implement composed relocations BOLT currently assumes (and asserts) that no two relocations can share the same offset. Although this is true in most cases, ELF has a feature called (not sure if this is an official term) composed relocations [1] where multiple relocations at the same offset are combined to produce a single value. For example, to support label subtraction (a - b) on RISC-V, two relocations are emitted at the same offset: - R_RISCV_ADD32 a + 0 - R_RISCV_SUB32 b + 0 which, when combined, will produce the value of (a - b). To support this in BOLT, first, RelocationSetType in BinarySection is changed to be a multiset in order to allow it to store multiple relocations at the same offset. Next, Relocation::emit() is changed to receive an iterator pair of relocations. In most cases, these will point to a single relocation in which case its behavior is unaltered by this patch. For composed relocations, they should point to all relocations at the same offset and the following happens: - A new method Relocation::createExpr() is called for every relocation. This method is essentially the same as the original emit() except that it returns the MCExpr without emitting it. - The MCExprs of relocations i and i+1 are combined using the opcode returned by the new method Relocation::getComposeOpcodeFor(). - After combining all MCExprs, the last one is emitted. Note that in the current patch, getComposeOpcodeFor() simply calls llvm_unreachable() since none of the current targets use composed relocations. This will change once the RISC-V target lands. Finally, BinarySection::emitAsData() is updated to group relocations by offset and emit them all at once. Note that this means composed relocations are only supported in data sections. Since this is the only place they seem to be used in RISC-V, I believe it's reasonable to only support them there for now to avoid further code complexity. [1]: https://www.sco.com/developers/gabi/latest/ch4.reloc.html Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D146546	2023-06-19 17:11:08 +02:00
Kazu Hirata	e7541f561d	[BOLT] Use llvm::is_contained (NFC)	2023-06-18 11:53:01 -07:00
Kazu Hirata	b188f9f597	[BOLT] Use {StringMap,DenseMapBase}::lookup (NFC)	2023-06-16 07:48:19 -07:00
Job Noorman	f873029386	[BOLT] Add minimal RISC-V 64-bit support Just enough features are implemented to process a simple "hello world" executable and produce something that still runs (including libc calls). This was mainly a matter of implementing support for various relocations. Currently, the following are handled: - R_RISCV_JAL - R_RISCV_CALL - R_RISCV_CALL_PLT - R_RISCV_BRANCH - R_RISCV_RVC_BRANCH - R_RISCV_RVC_JUMP - R_RISCV_GOT_HI20 - R_RISCV_PCREL_HI20 - R_RISCV_PCREL_LO12_I - R_RISCV_RELAX - R_RISCV_NONE Executables linked with linker relaxation will probably fail to be processed. BOLT relocates .text to a high address while leaving .plt at its original (low) address. This causes PC-relative PLT calls that were relaxed to a JAL to not fit their offset in an I-immediate anymore. This is something that will be addressed in a later patch. Changes to the BOLT core are relatively minor. Two things were tricky to implement and needed slightly larger changes. I'll explain those below. The R_RISCV_CALL(_PLT) relocation is put on the first instruction of a AUIPC/JALR pair, the second does not get any relocation (unlike other PCREL pairs). This causes issues with the combinations of the way BOLT processes binaries and the RISC-V MC-layer handles relocations: - BOLT reassembles instructions one by one and since the JALR doesn't have a relocation, it simply gets copied without modification; - Even though the MC-layer handles R_RISCV_CALL properly (adjusts both the AUIPC and the JALR), it assumes the immediates of both instructions are 0 (to be able to or-in a new value). This will most likely not be the case for the JALR that got copied over. To handle this difficulty without resorting to RISC-V-specific hacks in the BOLT core, a new binary pass was added that searches for AUIPC/JALR pairs and zeroes-out the immediate of the JALR. A second difficulty was supporting ABS symbols. As far as I can tell, ABS symbols were not handled at all, causing __global_pointer$ to break. RewriteInstance::analyzeRelocation was updated to handle these generically. Tests are provided for all supported relocations. Note that in order to test the correct handling of PLT entries, an ELF file produced by GCC had to be used. While I tried to strip the YAML representation, it's still quite large. Any suggestions on how to improve this would be appreciated. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D145687	2023-06-16 12:19:36 +02:00
Amir Ayupov	224e4cc516	[BOLT] Sort BranchData in DataAggregator Align perf reader to fdata behavior by sorting BranchData after reading samples, in the same way as DataReader: `20c66a0c66/bolt/lib/Profile/DataReader.cpp (L1239)` Namely, that order affects CallSiteInfo annotations which determine the construction order of CallGraph, which in turn affects function reordering. Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D152731	2023-06-15 12:08:57 -07:00
Job Noorman	05634f7346	[BOLT] Move from RuntimeDyld to JITLink RuntimeDyld has been deprecated in favor of JITLink. [1] This patch replaces all uses of RuntimeDyld in BOLT with JITLink. Care has been taken to minimize the impact on the code structure in order to ease the inspection of this (rather large) changeset. Since BOLT relied on the RuntimeDyld API in multiple places, this wasn't always possible though and I'll explain the changes in code structure first. Design note: BOLT uses a JIT linker to perform what essentially is static linking. No linked code is ever executed; the result of linking is simply written back to an executable file. For this reason, I restricted myself to the use of the core JITLink library and avoided ORC as much as possible. RuntimeDyld contains methods for loading objects (loadObject) and symbol lookup (getSymbol). Since JITLink doesn't provide a class with a similar interface, the BOLTLinker abstract class was added to implement it. It was added to Core since both the Rewrite and RuntimeLibs libraries make use of it. Wherever a RuntimeDyld object was used before, it was replaced with a BOLTLinker object. There is one major difference between the RuntimeDyld and BOLTLinker interfaces: in JITLink, section allocation and the application of fixups (relocation) happens in a single call (jitlink::link). That is, there is no separate method like finalizeWithMemoryManagerLocking in RuntimeDyld. BOLT used to remap sections between allocating (loadObject) and linking them (finalizeWithMemoryManagerLocking). This doesn't work anymore with JITLink. Instead, BOLTLinker::loadObject accepts a callback that is called before fixups are applied which is used to remap sections. The actual implementation of the BOLTLinker interface lives in the JITLinkLinker class in the Rewrite library. It's the only part of the BOLT code that should directly interact with the JITLink API. For loading object, JITLinkLinker first creates a LinkGraph (jitlink::createLinkGraphFromObject) and then links it (jitlink::link). For the latter, it uses a custom JITLinkContext with the following properties: - Use BOLT's ExecutableFileMemoryManager. This one was updated to implement the JITLinkMemoryManager interface. Since BOLT never executes code, its finalization step is a no-op. - Pass config: don't use the default target passes since they modify DWARF sections in a way that seems incompatible with BOLT. Also run a custom pre-prune pass that makes sure sections without symbols are not pruned by JITLink. - Implement symbol lookup. This used to be implemented by BOLTSymbolResolver. - Call the section mapper callback before the final linking step. - Copy symbol values when the LinkGraph is resolved. Symbols are stored inside JITLinkLinker to ensure that later objects (i.e., instrumentation libraries) can find them. This functionality used to be provided by RuntimeDyld but I did not find a way to use JITLink directly for this. Some more minor points of interest: - BinarySection::SectionID: JITLink doesn't have something equivalent to RuntimeDyld's Section IDs. Instead, sections can only be referred to by name. Hence, SectionID was updated to a string. - There seem to be no tests for Mach-O. I've tested a small hello-world style binary but not more than that. - On Mach-O, JITLink "normalizes" section names to include the segment name. I had to parse the section name back from this manually which feels slightly hacky. [1] https://reviews.llvm.org/D145686#4222642 Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D147544	2023-06-15 11:13:52 +02:00
Maksim Panchenko	1ebad216ef	[BOLT][NFCI] Remove redundant instance of MCAsmBackend Use instance of MCAsmBackend from BinaryContext instead of creating a new one. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D152849	2023-06-13 13:14:05 -07:00
Maksim Panchenko	5c4d306a10	[BOLT][NFC] Change signature of MCPlusBuilder::isUnsupportedBranch() Make MCPlusBuilder::isUnsupportedBranch() take MCInst, not opcode. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D152765	2023-06-13 12:20:36 -07:00
Maksim Panchenko	c4e60a7f60	[BOLT] Fix --max-funcs=<N> option Fix off-by-one error while handling of the --max-funcs=<N> option. We used to process N+1 functions when N was requested. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D152751	2023-06-12 16:54:14 -07:00
Maksim Panchenko	43f56a2f27	[BOLT] Fix handling of code references from unmodified code In lite mode (default for X86), BOLT optimizes and relocates functions with profile. The rest of the code is preserved, but if it references relocated code such references have to be updated. The update is handled by scanExternalRefs() function. Note that we cannot solely rely on relocations written by the linker, as not all code references are exposed to the linker. Additionally, the linker can modify certain instructions and relocations will no longer match the code. With this change, start using symbolic disassembler for scanning code for references in scanExternalRefs(). Unlike the previous approach, the symbolizer properly detects and creates references for instructions with multiple/ambiguous symbolic operands and handles cases where a relocation doesn't match any operand. See test cases for examples. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D152631	2023-06-12 10:46:51 -07:00
Amir Ayupov	702fe36b70	[BOLT][NFC] Const-ify getDynamicRelocationAt Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D152662	2023-06-12 09:55:16 -07:00

1 2 3 4 5 ...

531 Commits