clang-p2996

Author	SHA1	Message	Date
Alexander Yermolovich	ad4cead67c	[BOLT][DWARF][NFC] Initialize CloneUnitCtxMap with current partition size (#75876 ) We would always allocate maximum amount for vector containing DWARFUnitInfo. In real usecases what ends up hapenning is we allocate a giant vector when processing one CU, or for thin-lto case multiple CUs. This lead to a lot of memory overhead, and 2x BOLT processing slowdown for at least one service built with monolithic DWARF. For binaries built with LTO with clang all of CUs that have cross references will share an abbrev table and will be processed in one batch. Rest of CUs are processesd in --cu-processing-batch-size size. Which defaults to 1. For theoretical cases where cross-cu references are present, but they do not share abbrev will increase the size of CloneUnitCtxMap as each CU is being processsed.	2023-12-20 16:12:52 -08:00
Alexander Yermolovich	bf2b035e58	[BOLT][DWARF] Fix handling .debug_str_offsets for type units (#75522 ) There was an assumpiton that TUs and CUs share .debug_str_offsets contribution. For ThinLTO builds it is not the case. Changed so that we parse contributions for TUs also, and did some refactoring so that we don't re-parse contributions that were not modified.	2023-12-14 17:27:21 -08:00
Kazu Hirata	ad8fd5b185	[BOLT] Use StringRef::{starts,ends}_with (NFC) This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with.	2023-12-13 23:34:49 -08:00
Alexander Yermolovich	fb9a851224	[BOLT][DWARF] Fix handling of debug_str_offsets (#75100 ) We were not setting size field of .debug_str_offsets correctly. Fixed it, and added a test.	2023-12-11 15:56:32 -08:00
Kazu Hirata	1cc5431285	[BOLT] Fix warnings This patch fixes: bolt/lib/Core/BinaryFunctionProfile.cpp:222:10: error: variable 'BBMergeSI' set but not used [-Werror,-Wunused-but-set-variable] bolt/lib/Passes/VeneerElimination.cpp:67:12: error: variable 'VeneerCallers' set but not used [-Werror,-Wunused-but-set-variable]	2023-12-11 12:55:29 -08:00
Amir Ayupov	b039ccc684	[BOLT] Provide backwards compatibility for YAML profile with std::hash (#74253 ) Provide backwards compatibility for YAML profile that uses `std::hash`: xxh3 hash is the default for newly produced profile (sets `std-hash: false`), whereas the profile that doesn't specify `std-hash` will be treated as `std-hash: true`, preserving old behavior.	2023-12-11 12:27:32 -08:00
Nathan Sidwell	9596676e65	[BOLT] Determine address size from binary (#74870 ) Query the executable for address size.	2023-12-09 14:39:57 -05:00
ShatianWang	56bbf8135e	[BOLT] CDSplit main logic part 1/2 (#73895 ) This diff defines and initializes auxiliary variables used by CDSplit and implements two important helper functions. The first helper function approximates the block level size increase if a function is hot-warm split at a given split index (X86 specific). The second helper function finds all calls in the form of X->Y or Y->X for each BF given function order [... X ... BF ... Y ...]. These calls are referred to as "cover calls". Their distance will decrease if BF's hot fragment size is further reduced by hot-warm splitting. NFC.	2023-11-30 20:55:36 -05:00
Maksim Panchenko	4f3081296f	[BOLT][NFC] Fix comment (#73983 ) Fix off-by-one error in comment.	2023-11-30 14:31:38 -08:00
ShatianWang	c43d0432ef	[BOLT] Create .text.warm for 3-way splitting (#73863 ) This commit explicitly adds a warm code section, .text.warm, when -split-functions -split-strategy=cdsplit is used. This replaces the previous approach of using .text.cold.0 as warm and .text.cold.1 as cold in 3-way function splitting. NFC.	2023-11-29 22:42:36 -05:00
Maksim Panchenko	4bcbbe1f70	[BOLT] Refactor fixBranches() (#73752 ) Simplify code in fixBranches(). Mostly NFC, accept the x86-specific check for code fragments now takes into account presence of more than two fragments. Should only matter when we split code into multiple fragments and can run fixBranches() more than once. Also, don't replace a branch target with the same one, as such operation may allocate memory for extra MCSymbolRefExpr.	2023-11-29 16:24:16 -08:00
Alexander Yermolovich	00dbea7c73	[BOLT][DWARF][NFC] Added const to variable (#73731 ) Nit followup to 72729.	2023-11-28 17:30:28 -08:00
Alexander Yermolovich	b47b3bee7b	[BOLT][DWARF] Fix handling of DWARF5 DWP (#72729 ) Fixed handling of DWP as input. Before BOLT crashed. Now it will write out correct CU, and all the TUs. Potential future improvement is to scan all the TUs used in this CU, and only include those.	2023-11-28 15:54:14 -08:00
spupyrev	e7dd596c68	[BOLT] Use deterministic xxh3 for computing BF/BB hashes (#72542 ) std::hash and ADT/Hashing::hash_value are non-deterministic functions whose results might vary across implementation/process/execution. Using xxh3 instead for computing hashes of BinaryFunctions and BinaryBasicBlock for stale profile matching. (A possible alternative is to use ADT/StableHashing.h based on FNV hashing but xxh3 seems to be more popular in LLVM) This is to address https://github.com/llvm/llvm-project/issues/65241.	2023-11-27 14:45:46 -08:00
Maksim Panchenko	f4834255d3	[BOLT] Reset output addresses for deleted blocks (#73429 ) This is a follow-up to #73076. We need to reset output addresses for deleted blocks, otherwise the address translation may mistakenly attribute input address of a deleted block to a non-zero address. While working on a test case, I've discovered that DWARF output ranges were already broken for deleted basic blocks: #73428. I will provide a test case for this PR with a DWARF address range fix PR.	2023-11-25 23:23:47 -08:00
Maksim Panchenko	365114292a	[BOLT][NFC] Refactor function state check (#73420 ) Remove redundant check in updateOutputValues().	2023-11-25 21:09:54 -08:00
ShatianWang	d333c0e062	[BOLT] Extend calculateEmittedSize() for block size calculation (#73076 ) This commit modifies BinaryContext::calculateEmittedSize() to update the BinaryBasicBlock::OutputAddressRange of each basic block in the function in place. BinaryBasicBlock::getOutputSize() now gives the emitted size of the basic block.	2023-11-23 15:28:31 -05:00
llongint	f3e54f2f97	[BOLT][NFC] Extract a function for dump MCInst (#67225 ) In GDB debugging, obtaining the assembly representation of MCInst is more intuitive.	2023-11-21 20:30:44 +08:00
Maksim Panchenko	84602066a6	[BOLT] Fix C++ exceptions when LPStart is specified (#72737 ) Whenever LPStartEncoding was different from DW_EH_PE_omit, we used to miscalculate LPStart. As a result, landing pads were assigned wrong addresses. Fix that.	2023-11-20 20:55:38 -08:00
Maksim Panchenko	f653f6d57a	[BOLT][NFC] Delete unused declarations (#72596 )	2023-11-16 23:36:19 -08:00
JohnLee1243	ae51ec84bb	[Bolt] Solving pie support issue (#65494 ) Now PIE is default supported after clang 14. It cause parsing error when using perf2bolt. The reason is the base address can not get correctly. Fix the method of geting base address. If SegInfo.Alignment is not equal to pagesize, alignDown(SegInfo.FileOffset, SegInfo.Alignment) can not equal to FileOffset. So the SegInfo.FileOffset and FileOffset should be aligned by SegInfo.Alignment first and then judge whether they are equal. The .text segment's offset from base address in VAS is aligned by pagesize. So MMapAddress's offset from base address is alignDown(SegInfo.Address, pagesize) instead of alignDown(SegInfo.Address, SegInfo.Alignment). So the base address calculate way should be changed. Co-authored-by: Li Zhuohang <lizhuohang3@huawei.com>	2023-11-16 15:05:06 +08:00
Vladislav Khmelevsky	5b59540661	[BOLT] Enhance fixed indirect branch handling (#71324 ) Previously HasFixedIndirectBranch was set in BF to set isSimple to false later because of unreachable bb ellimination pass which might remove the BB with it's symbols accessed by other instructions than calls. It seems to be that better solution would be to add extra entry point on target offset instead of marking BF as non-simple.	2023-11-16 09:30:55 +04:00
Maksim Panchenko	e823136d43	[BOLT] Refactor --keep-nops option. NFC. (#72228 ) Run RemoveNops pass only if --keep-nops is set to false (default).	2023-11-14 11:28:13 -08:00
Maksim Panchenko	f633f325a1	[BOLT] Fix NOP instruction emission on x86 (#72186 ) Use MCAsmBackend::writeNopData() interface to emit NOP instructions on x86. There are multiple forms of NOP instruction on x86 with different sizes. Currently, LLVM's assembly/disassembly does not support all forms correctly which can lead to a breakage of input code semantics, e.g. if the program relies on NOP instructions for reserving a patch space. Add "--keep-nops" option to preserve NOP instructions.	2023-11-13 18:12:39 -08:00
Maksim Panchenko	2db9b6a93f	[BOLT] Make instruction size a first-class annotation (#72167 ) When NOP instructions are used to reserve space in the code, e.g. for patching, it becomes critical to preserve their original size while emitting the code. On x86, we rely on "Size" annotation for NOP instructions size, as the original instruction size is lost in the disassembly/assembly process. This change makes instruction size a first-class annotation and is affectively NFCI. A follow-up diff will use the annotation for code emission.	2023-11-13 14:33:39 -08:00
Vladislav Khmelevsky	c6c04a83a7	[BOLT] Run EliminateUnreachableBlocks in parallel (#71299 ) The wall time for this pass decreased on my laptop from ~80 sec to 5 sec processing the clang.	2023-11-10 00:46:04 +04:00
spaette	1a2f83366b	[BOLT] Fix typos (#68121 ) Closes https://github.com/llvm/llvm-project/issues/63097 Before merging please make sure the change to bolt/include/bolt/Passes/StokeInfo.h is correct. bolt/include/bolt/Passes/StokeInfo.h ```diff // This Pass solves the two major problems to use the Stoke program without - // proting its code: + // probing its code: ``` I'm still not happy about the awkward wording in this comment. bolt/include/bolt/Passes/FixRelaxationPass.h ``` $ ed -s bolt/include/bolt/Passes/FixRelaxationPass.h <<<'9,12p' // This file declares the FixRelaxations class, which locates instructions with // wrong targets and fixes them. Such problems usually occures when linker // relaxes (changes) instructions, but doesn't fix relocations types properly // for them. $ ``` bolt/docs/doxygen.cfg.in bolt/include/bolt/Core/BinaryContext.h bolt/include/bolt/Core/BinaryFunction.h bolt/include/bolt/Core/BinarySection.h bolt/include/bolt/Core/DebugData.h bolt/include/bolt/Core/DynoStats.h bolt/include/bolt/Core/Exceptions.h bolt/include/bolt/Core/MCPlusBuilder.h bolt/include/bolt/Core/Relocation.h bolt/include/bolt/Passes/FixRelaxationPass.h bolt/include/bolt/Passes/InstrumentationSummary.h bolt/include/bolt/Passes/ReorderAlgorithm.h bolt/include/bolt/Passes/StackReachingUses.h bolt/include/bolt/Passes/StokeInfo.h bolt/include/bolt/Passes/TailDuplication.h bolt/include/bolt/Profile/DataAggregator.h bolt/include/bolt/Profile/DataReader.h bolt/lib/Core/BinaryContext.cpp bolt/lib/Core/BinarySection.cpp bolt/lib/Core/DebugData.cpp bolt/lib/Core/DynoStats.cpp bolt/lib/Core/Relocation.cpp bolt/lib/Passes/Instrumentation.cpp bolt/lib/Passes/JTFootprintReduction.cpp bolt/lib/Passes/ReorderData.cpp bolt/lib/Passes/RetpolineInsertion.cpp bolt/lib/Passes/ShrinkWrapping.cpp bolt/lib/Passes/TailDuplication.cpp bolt/lib/Rewrite/BoltDiff.cpp bolt/lib/Rewrite/DWARFRewriter.cpp bolt/lib/Rewrite/RewriteInstance.cpp bolt/lib/Utils/CommandLineOpts.cpp bolt/runtime/instr.cpp bolt/test/AArch64/got-ld64-relaxation.test bolt/test/AArch64/unmarked-data.test bolt/test/X86/Inputs/dwarf5-cu-no-debug-addr-helper.s bolt/test/X86/Inputs/linenumber.cpp bolt/test/X86/double-jump.test bolt/test/X86/dwarf5-call-pc-function-null-check.test bolt/test/X86/dwarf5-split-dwarf4-monolithic.test bolt/test/X86/dynrelocs.s bolt/test/X86/fallthrough-to-noop.test bolt/test/X86/tail-duplication-cache.s bolt/test/runtime/X86/instrumentation-ind-calls.s	2023-11-09 11:29:46 -08:00
Maksim Panchenko	11f52f783a	[BOLT][DWARF] Fix invalid address ranges (#71474 ) When NOP instructions are removed by BOLT and a DWARF address range falls past the removed instructions, it may lead to invalid DWARF ranges in the output binary. E.g. the range may fall outside of the basic block boundaries. This fix makes sure the modified range fits within the containing basic block. A proper fix requires tracking instructions within the block and will come in a different PR.	2023-11-09 09:55:49 -08:00
Maksim Panchenko	254ccb95e8	[BOLT] Follow-up to "Fix incorrect basic block output addresses" (#71630 ) In `8244ff6739`, I've introduced an assertion that incorrectly used BasicBlock::empty(). Some basic blocks may contain only pseudo instructions and thus BB->empty() will evaluate to false, while the actual code size will be zero.	2023-11-08 10:53:36 -08:00
Job Noorman	96b5e092dc	[BOLT] Support instrumentation hook via DT_FINI_ARRAY (#67348 ) BOLT currently hooks its its instrumentation finalization function via `DT_FINI`. However, this method of calling finalization routines is not supported anymore on newer ABIs like RISC-V. `DT_FINI_ARRAY` is preferred there. This patch adds support for hooking into `DT_FINI_ARRAY` instead if the binary does not have a `DT_FINI` entry. If it does, `DT_FINI` takes precedence so this patch should not change how the currently supported instrumentation targets behave. `DT_FINI_ARRAY` points to an array in memory of `DT_FINI_ARRAYSZ` bytes. It consists of pointer-length entries that contain the addresses of finalization functions. However, the addresses are only filled-in by the dynamic linker at load time using relative relocations. This makes hooking via `DT_FINI_ARRAY` a bit more complicated than via `DT_FINI`. The implementation works as follows: - While scanning the binary: find the section where `DT_FINI_ARRAY` points to, read its first dynamic relocation and use its addend to find the address of the fini function we will use to hook; - While writing the output file: overwrite the addend of the dynamic relocation with the address of the runtime library's fini function. Updating the dynamic relocation required a bit of boiler plate: since dynamic relocations are stored in a `std::multiset` which doesn't support getting mutable references to its items, functions were added to `BinarySection` to take an existing relocation and insert a new one.	2023-11-08 11:01:10 +00:00
Maksim Panchenko	d18b4f882b	[BOLT] Fix build after `0df1546`	2023-11-06 15:09:54 -08:00
Maksim Panchenko	0df154671b	[BOLT] Use Label annotation instead of EHLabel pseudo. NFCI. (#70179 ) When we need to attach EH label to an instruction, we can now use Label annotation instead of EHLabel pseudo instruction.	2023-11-06 14:43:14 -08:00
Maksim Panchenko	b336d741d0	[BOLT] Use direct storage for Label annotations. NFCI. (#70147 ) Store the Label annotation directly in the operand and avoid the extra allocation and indirection overheads associated with MCSimpleAnnotation.	2023-11-06 14:24:55 -08:00
maksfb	74e0a26fd1	[BOLT] Modify MCPlus annotation internals. NFCI. (#70412 ) When annotating MCInst instructions, attach extra annotation operands directly to the annotated instruction, instead of attaching them to an instruction pointed to by a special kInst operand. With this change, it's no longer necessary to allocate MCInst and most of the first-class annotations come with free memory as currently MCInst is declared with: SmallVector<MCOperand, 10> Operands; i.e. more operands than are normally being used. We still create a kInst operand with a nullptr instruction value to designate the beginning of annotation operands. However, this special operand might not be needed if we can rely on MCInstrDesc::NumOperands.	2023-11-06 12:14:22 -08:00
maksfb	e28c393bd1	[BOLT] Reduce the number of emitted symbols. NFCI. (#70175 ) We emit a symbol before an instruction for a number of reasons, e.g. for tracking LocSyms, debug line, or if the instruction has a label annotation. Currently, we may emit multiple symbols per instruction. Reuse the same label instead of creating and emitting new ones when possible. I'm planning to refactor EH labels as well in a separate diff. Change getLabel() to return a pointer instead of std::optional<> since an empty label should be treated identically to no label.	2023-11-06 11:41:47 -08:00
maksfb	7f031d1c7c	[BOLT] Fix address mapping for ICP code (#70136 ) When we create new code for indirect code promotion optimization, we should mark it as originating from the indirect jump instruction for BOLT address translation (BAT) to map it to the original instruction.	2023-11-06 11:25:49 -08:00
maksfb	6e26246c22	[BOLT][DWARF] Refactor address ranges processing (#71225 ) Create BinaryFunction::translateInputToOutputRange() and use it for updating DWARF debug ranges and location lists while de-duplicating the existing code. Additionally, move DWARF-specific code out of BinaryFunction and add print functions to facilitate debugging. Note that this change is deliberately kept "bug-level" compatible with the existing solution to keep it NFCI and make it easier to track any possible regressions in the future updates to the ranges-handling code.	2023-11-06 11:10:20 -08:00
Vladislav Khmelevsky	838331a081	[BOLT] Set NOOP size only on X86 (NFC) (#71307 ) Small fix, we have problems with noop size only on x86, no reason to do it on other platforms.	2023-11-06 15:21:50 +04:00
maksfb	8244ff6739	[BOLT] Fix incorrect basic block output addresses (#70000 ) Some optimization passes may duplicate basic blocks and assign the same input offset to a number of different blocks in a function. This is done e.g. to correctly map debugging ranges for duplicated code. However, duplicate input offsets present a problem when we use AddressMap to generate new addresses for basic blocks. The output address is calculated based on the input offset and will be the same for blocks with identical offsets. The result is potentially incorrect debug info and BAT records. To address the issue, we have to eliminate the dependency on input offsets while generating output addresses for a basic block. Each block has a unique label, hence we extend AddressMap to include address lookup based on MCSymbol and use the new functionality to update block addresses.	2023-10-24 12:22:43 -07:00
Amir Ayupov	3a72bcbf33	[BOLT] Fix build issues after #69836 (#70087 ) Fix clang build (`return Error => return std::move(Error)`)	2023-10-24 11:47:12 -07:00
Job Noorman	86bc486785	[BOLT][RISCV] Use target features from object file (#69836 ) We used to hard-code target features for RISC-V. However, most features (with the exception of relax) are stored in the object file. This patch extracts those features to ensure BOLT's output doesn't use any features not present in the input file.	2023-10-23 06:40:25 +00:00
Job Noorman	6795bfce4d	[BOLT][RISCV] Handle CIE's produced by GNU as (#69578 ) On RISC-V, GNU as produces the following initial instruction in CIE's: ``` DW_CFA_def_cfa_register: r2 ``` While I believe it is technically illegal to use this instruction without first using a `DW_CFA_def_cfa` (since the offset is undefined), both `readelf` and `llvm-dwarfdump` accept this and implicitly set the offset to 0. In BOLT, however, this triggers an assert (in `CFISnapshot::advanceTo`) as it (correctly) believes the offset is not set. This patch fixes this by setting the offset to 0 whenever executing `DW_CFA_def_cfa_register` while the offset is undefined. Note that this is probably the simplest workaround but it has a downside: while emitting CFI start, we check if the initial instructions are contained within `MCAsmInfo::getInitialFrameState` and omit them if they are. This will not be true for GNU CIE's (since they differ from LLVM's) which causes an unnecessary `DW_CFA_def_cfa_register` to be emitted. While technically correct, it would probably be better to replace the GNU CIE with the one used by LLVM to avoid this situation. This would solve the problem this patch solves while also preventing unnecessary CFI instructions. However, this is a bit trickier to implement correctly so I propose to keep this for a later time. Note on testing: the test creates a simple function with three basic blocks and forces the CFI state of the last one to be different from the others using an arbitrary CFI instruction. Then, `--reorder-blocks=reverse` is used to force `CFISnapshot::advanceTo` to be called. This causes an assert on the current main branch.	2023-10-20 15:49:17 +00:00
Kazu Hirata	eab5d337f0	[BOLT] Use llvm::erase_if (NFC)	2023-10-13 18:22:44 -07:00
Job Noorman	c6f065d9d9	[BOLT][RISCV] Recognize mapping syms with encoded ISA (#68964 ) RISC-V supports mapping syms for code that encode the exact ISA for which the code is valid. They have the form `$x<ISA>` where `<ISA>` is the textual encoding of an ISA specification. BOLT currently doesn't recognize these mapping symbols causing many binaries compiled with newer versions of GCC (which emits them) to not be properly processed. This patch makes sure BOLT recognizes them as code markers. Note that LLVM does not emit these kinds of mapping symbols yet so the test is based on a binary produced by GCC.	2023-10-13 10:34:13 +00:00
Kazu Hirata	4a0ccfa865	Use llvm::endianness::{big,little,native} (NFC) Note that llvm::support::endianness has been renamed to llvm::endianness while becoming an enum class as opposed to an enum. This patch replaces support::{big,little,native} with llvm::endianness::{big,little,native}.	2023-10-12 21:21:45 -07:00
Vladislav Khmelevsky	b7944f7c04	[BOLT] Return proper minimal alignment from BF (#67707 ) Currently minimal alignment of function is hardcoded to 2 bytes. Add 2 more cases: 1. In case BF is data in code return the alignment of CI as minimal alignment 2. For aarch64 and riscv platforms return the minimal value of 4 (added test for aarch64) Otherwise fallback to returning the 2 as it previously was.	2023-10-12 09:33:08 +04:00
Job Noorman	da37139ac9	[BOLT][NFC] Add allocator id to MCPlusBuilder::setLabel (#68707 ) This will be needed for some RISC-V instrumentation functions and is also consistent with other annotation setters.	2023-10-11 07:25:46 +00:00
Job Noorman	ff5e2babcb	[BOLT] Improve handling of relocations targeting specific instructions (#66395 ) On RISC-V, there are certain relocations that target a specific instruction instead of a more abstract location like a function or basic block. Take the following example that loads a value from symbol `foo`: ``` nop 1: auipc t0, %pcrel_hi(foo) ld t0, %pcrel_lo(1b)(t0) ``` This results in two relocation: - auipc: `R_RISCV_PCREL_HI20` referencing `foo`; - ld: `R_RISCV_PCREL_LO12_I` referencing to local label `1` which points to the auipc instruction. It is of utmost importance that the `R_RISCV_PCREL_LO12_I` keeps referring to the auipc instruction; if not, the program will fail to assemble. However, BOLT currently does not guarantee this. BOLT currently assumes that all local symbols are jump targets and always starts a new basic block at symbol locations. The example above results in a CFG the looks like this: ``` .BB0: nop .BB1: auipc t0, %pcrel_hi(foo) ld t0, %pcrel_lo(.BB1)(t0) ``` While this currently works (i.e., the `R_RISCV_PCREL_LO12_I` relocation points to the correct instruction), it has two downsides: - Too many basic blocks are created (the example above is logically only one yet two are created); - If instructions are inserted in `.BB1` (e.g., by instrumentation), things will break since the label will not point to the auipc anymore. This patch proposes to fix this issue by teaching BOLT to track labels that should always point to a specific instruction. This is implemented as follows: - Add a new annotation type (`kLabel`) that allows us to annotate instructions with an `MCSymbol *`; - Whenever we encounter a relocation type that is used to refer to a specific instruction (`Relocation::isInstructionReference`), we register it without a symbol; - During disassembly, whenever we encounter an instruction with such a relocation, create a symbol for its target and store it in an offset to symbol map (to ensure multiple relocations referencing the same instruction use the same label); - After disassembly, iterate this map to attach labels to instructions via the new annotation type; - During emission, emit these labels right before the instruction. I believe the use of annotations works quite well for this use case as it allows us to reliably track instruction labels. If we were to store them as offsets in basic blocks, it would be error prone to keep them updated whenever instructions are inserted or removed. I have chosen to add labels as first-class annotations (as opposed to a generic one) because the documentation of `MCAnnotation` suggests that generic annotations are to be used for optional metadata that can be discarded without affecting correctness. As this is not the case for labels, a first-class annotation seemed more appropriate.	2023-10-06 06:46:16 +00:00
Job Noorman	7fa33773e3	[BOLT][RISCV] Handle long tail calls (#67098 ) Long tail calls use the following instruction sequence on RISC-V: ``` 1: auipc xi, %pcrel_hi(sym) jalr zero, %pcrel_lo(1b)(xi) ``` Since the second instruction in isolation looks like an indirect branch, this confused BOLT and most functions containing a long tail call got marked with "unknown control flow" and didn't get optimized as a consequence. This patch fixes this by detecting long tail call sequence in `analyzeIndirectBranch`. `FixRISCVCallsPass` also had to be updated to expand long tail calls to `PseudoTAIL` instead of `PseudoCALL`. Besides this, this patch also fixes a minor issue with compressed tail calls (`c.jr`) not being detected. Note that I had to change `BinaryFunction::postProcessIndirectBranches` slightly: the documentation of `MCPlusBuilder::analyzeIndirectBranch` mentions that the [`Begin`, `End`) range contains the instructions immediately preceding `Instruction`. However, in `postProcessIndirectBranches`, all the instructions in the BB where passed in the range. This made it difficult to find the preceding instruction so I made sure only the preceding instructions are passed.	2023-10-05 08:55:30 +00:00
Job Noorman	c7d6d62252	[BOLT][RISCV] Implement TLS le/ie relocations (#67112 ) Handle the following relocations related to TLS local-exec and initial-exec: - R_RISCV_TLS_GOT_HI20 - R_RISCV_TPREL_HI20 - R_RISCV_TPREL_ADD - R_RISCV_TPREL_LO12_I - R_RISCV_TPREL_LO12_S In addition, GNU ld has a quirk where after TLS le relaxation, two unofficial relocation types may be emitted: - R_RISCV_TPREL_I - R_RISCV_TPREL_S Since they are unofficial (defined in the reserved range of relocation types), LLVM does not define them. Hence, I've defined them locally in BOLT in a private namespace.	2023-10-05 08:53:51 +00:00

1 2 3 4 5 ...

322 Commits