clang-p2996

Author	SHA1	Message	Date
ShatianWang	1577483413	[BOLT] Don't split likely fallthrough in CDSplit (#76164 ) This diff speeds up CDSplit by not considering any hot-warm splitting point that could break a fall-through branch from a basic block to its most likely successor. Co-authored-by: spupyrev <spupyrev@fb.com>	2023-12-21 16:17:10 -05:00
Alexander Yermolovich	ad4cead67c	[BOLT][DWARF][NFC] Initialize CloneUnitCtxMap with current partition size (#75876 ) We would always allocate maximum amount for vector containing DWARFUnitInfo. In real usecases what ends up hapenning is we allocate a giant vector when processing one CU, or for thin-lto case multiple CUs. This lead to a lot of memory overhead, and 2x BOLT processing slowdown for at least one service built with monolithic DWARF. For binaries built with LTO with clang all of CUs that have cross references will share an abbrev table and will be processed in one batch. Rest of CUs are processesd in --cu-processing-batch-size size. Which defaults to 1. For theoretical cases where cross-cu references are present, but they do not share abbrev will increase the size of CloneUnitCtxMap as each CU is being processsed.	2023-12-20 16:12:52 -08:00
Alexander Yermolovich	bf2b035e58	[BOLT][DWARF] Fix handling .debug_str_offsets for type units (#75522 ) There was an assumpiton that TUs and CUs share .debug_str_offsets contribution. For ThinLTO builds it is not the case. Changed so that we parse contributions for TUs also, and did some refactoring so that we don't re-parse contributions that were not modified.	2023-12-14 17:27:21 -08:00
Kazu Hirata	ad8fd5b185	[BOLT] Use StringRef::{starts,ends}_with (NFC) This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with.	2023-12-13 23:34:49 -08:00
Alexander Yermolovich	fb9a851224	[BOLT][DWARF] Fix handling of debug_str_offsets (#75100 ) We were not setting size field of .debug_str_offsets correctly. Fixed it, and added a test.	2023-12-11 15:56:32 -08:00
Kazu Hirata	1cc5431285	[BOLT] Fix warnings This patch fixes: bolt/lib/Core/BinaryFunctionProfile.cpp:222:10: error: variable 'BBMergeSI' set but not used [-Werror,-Wunused-but-set-variable] bolt/lib/Passes/VeneerElimination.cpp:67:12: error: variable 'VeneerCallers' set but not used [-Werror,-Wunused-but-set-variable]	2023-12-11 12:55:29 -08:00
Amir Ayupov	b039ccc684	[BOLT] Provide backwards compatibility for YAML profile with std::hash (#74253 ) Provide backwards compatibility for YAML profile that uses `std::hash`: xxh3 hash is the default for newly produced profile (sets `std-hash: false`), whereas the profile that doesn't specify `std-hash` will be treated as `std-hash: true`, preserving old behavior.	2023-12-11 12:27:32 -08:00
sinan	fdb13cf531	[BOLT] Fix local out-of-range stub issue in LongJmp (#73918 ) If a local stub is out-of-range, at LongJmp we will try to find another local stub first. However, The original implementation do not work as expected and it leads to an infinite loop between replaceTargetWithStub and fixBranches. After this patch, we first convert the target of BB back to the target of the local stub, and then look up for other valid local stubs and so on.	2023-12-11 10:38:28 +08:00
Nathan Sidwell	9596676e65	[BOLT] Determine address size from binary (#74870 ) Query the executable for address size.	2023-12-09 14:39:57 -05:00
Ho Cheung	fa5486e487	[BOLT] [Passes] Fix two compile warnings in BOLT (#73086 ) Fix build issue on Windows. issue:#73085 @maksfb PTAL thank you	2023-12-06 11:19:07 -08:00
eleviant	f20af7372f	[bolt] Support arm64 FP register spills (#73021 ) At the moment llvm-bolt fails when analyzing jump tables on aarch64 in case FP register spill/reload is used.	2023-12-05 20:32:58 +01:00
ShatianWang	296088bdf3	[BOLT][NFC] Remove unused code for CDSplit (#74136 ) This diff removes JumpInfo related code that is no longer needed by CDSplit from SplitFunctions.cpp.	2023-12-01 15:21:30 -05:00
ShatianWang	4483cf2d8b	[BOLT] CDSplit main logic part 2/2 (#74032 ) This diff implements the main splitting logic of CDSplit. CDSplit processes functions in a binary in parallel. For each function BF, it assumes that all other functions are hot-cold split. For each possible hot-warm split point of BF, it computes its corresponding SplitScore, and chooses the split point with the best SplitScore. The SplitScore of each split point is computed in the following way: each call edge or jump edge has an edge score that is proportional to its execution count, and inversely proportional to its distance. The SplitScore of a split point is a sum of edge scores over a fixed set of edges whose distance can change due to hot-warm splitting BF. This set contains all cover calls in the form of X->Y or Y->X given function order [... X ... BF ... Y ...]; we refer to the sum of edge scores over the set of cover calls as CoverCallScore. This set also contains all jump edges (branches) within BF as well as all call edges originated from BF; we refer to the sum of edge scores over this set of edges as LocalScore. CDSplit finds the split index maximizing CoverCallScore + LocalScore.	2023-11-30 23:17:11 -05:00
ShatianWang	56bbf8135e	[BOLT] CDSplit main logic part 1/2 (#73895 ) This diff defines and initializes auxiliary variables used by CDSplit and implements two important helper functions. The first helper function approximates the block level size increase if a function is hot-warm split at a given split index (X86 specific). The second helper function finds all calls in the form of X->Y or Y->X for each BF given function order [... X ... BF ... Y ...]. These calls are referred to as "cover calls". Their distance will decrease if BF's hot fragment size is further reduced by hot-warm splitting. NFC.	2023-11-30 20:55:36 -05:00
Maksim Panchenko	4f3081296f	[BOLT][NFC] Fix comment (#73983 ) Fix off-by-one error in comment.	2023-11-30 14:31:38 -08:00
Alexander Yermolovich	52be47b890	[BOLT][DWARF] Add support to create path (#73884 ) When option --dwarf-output-path is specified, if the path does not exist BOLT will now create it. This is what also happens when --plugin-opt=dwo_dir=<value> is specified to LLD.	2023-11-30 09:41:01 -08:00
ShatianWang	c43d0432ef	[BOLT] Create .text.warm for 3-way splitting (#73863 ) This commit explicitly adds a warm code section, .text.warm, when -split-functions -split-strategy=cdsplit is used. This replaces the previous approach of using .text.cold.0 as warm and .text.cold.1 as cold in 3-way function splitting. NFC.	2023-11-29 22:42:36 -05:00
Maksim Panchenko	4bcbbe1f70	[BOLT] Refactor fixBranches() (#73752 ) Simplify code in fixBranches(). Mostly NFC, accept the x86-specific check for code fragments now takes into account presence of more than two fragments. Should only matter when we split code into multiple fragments and can run fixBranches() more than once. Also, don't replace a branch target with the same one, as such operation may allocate memory for extra MCSymbolRefExpr.	2023-11-29 16:24:16 -08:00
ShatianWang	076bd22f57	[BOLT] Add structure of CDSplit to SplitFunctions (#73430 ) This commit establishes the general structure of the CDSplit strategy in SplitFunctions without incorporating the exact splitting logic. With -split-functions -split-strategy=cdsplit, the SplitFunctions pass will run twice: the first time is before function reordering and functions are hot-cold split; the second time is after function reordering and functions are hot-warm-cold split based on the fixed function ordering. Currently, all functions are hot-warm split after the entry block in the second splitting pass. Subsequent commits will introduce the precise splitting logic. NFC.	2023-11-29 15:43:21 -05:00
Maksim Panchenko	0acfe8483a	[BOLT][DWARF] Fix output ranges for deleted code (#73464 ) Set range low_pc to 0 for DIEs that correspond to deleted code. Fixes #73428	2023-11-28 22:40:53 -08:00
Alexander Yermolovich	00dbea7c73	[BOLT][DWARF][NFC] Added const to variable (#73731 ) Nit followup to 72729.	2023-11-28 17:30:28 -08:00
Alexander Yermolovich	b47b3bee7b	[BOLT][DWARF] Fix handling of DWARF5 DWP (#72729 ) Fixed handling of DWP as input. Before BOLT crashed. Now it will write out correct CU, and all the TUs. Potential future improvement is to scan all the TUs used in this CU, and only include those.	2023-11-28 15:54:14 -08:00
spupyrev	e7dd596c68	[BOLT] Use deterministic xxh3 for computing BF/BB hashes (#72542 ) std::hash and ADT/Hashing::hash_value are non-deterministic functions whose results might vary across implementation/process/execution. Using xxh3 instead for computing hashes of BinaryFunctions and BinaryBasicBlock for stale profile matching. (A possible alternative is to use ADT/StableHashing.h based on FNV hashing but xxh3 seems to be more popular in LLVM) This is to address https://github.com/llvm/llvm-project/issues/65241.	2023-11-27 14:45:46 -08:00
Maksim Panchenko	f4834255d3	[BOLT] Reset output addresses for deleted blocks (#73429 ) This is a follow-up to #73076. We need to reset output addresses for deleted blocks, otherwise the address translation may mistakenly attribute input address of a deleted block to a non-zero address. While working on a test case, I've discovered that DWARF output ranges were already broken for deleted basic blocks: #73428. I will provide a test case for this PR with a DWARF address range fix PR.	2023-11-25 23:23:47 -08:00
Maksim Panchenko	365114292a	[BOLT][NFC] Refactor function state check (#73420 ) Remove redundant check in updateOutputValues().	2023-11-25 21:09:54 -08:00
ShatianWang	d333c0e062	[BOLT] Extend calculateEmittedSize() for block size calculation (#73076 ) This commit modifies BinaryContext::calculateEmittedSize() to update the BinaryBasicBlock::OutputAddressRange of each basic block in the function in place. BinaryBasicBlock::getOutputSize() now gives the emitted size of the basic block.	2023-11-23 15:28:31 -05:00
Ho Cheung	3af586f797	[BOLT] Fix type mismatch error (#73016 ) Fix build issue on Windows. Fixes #73006	2023-11-21 19:13:46 -08:00
llongint	f3e54f2f97	[BOLT][NFC] Extract a function for dump MCInst (#67225 ) In GDB debugging, obtaining the assembly representation of MCInst is more intuitive.	2023-11-21 20:30:44 +08:00
Maksim Panchenko	84602066a6	[BOLT] Fix C++ exceptions when LPStart is specified (#72737 ) Whenever LPStartEncoding was different from DW_EH_PE_omit, we used to miscalculate LPStart. As a result, landing pads were assigned wrong addresses. Fix that.	2023-11-20 20:55:38 -08:00
Maksim Panchenko	f653f6d57a	[BOLT][NFC] Delete unused declarations (#72596 )	2023-11-16 23:36:19 -08:00
JohnLee1243	ae51ec84bb	[Bolt] Solving pie support issue (#65494 ) Now PIE is default supported after clang 14. It cause parsing error when using perf2bolt. The reason is the base address can not get correctly. Fix the method of geting base address. If SegInfo.Alignment is not equal to pagesize, alignDown(SegInfo.FileOffset, SegInfo.Alignment) can not equal to FileOffset. So the SegInfo.FileOffset and FileOffset should be aligned by SegInfo.Alignment first and then judge whether they are equal. The .text segment's offset from base address in VAS is aligned by pagesize. So MMapAddress's offset from base address is alignDown(SegInfo.Address, pagesize) instead of alignDown(SegInfo.Address, SegInfo.Alignment). So the base address calculate way should be changed. Co-authored-by: Li Zhuohang <lizhuohang3@huawei.com>	2023-11-16 15:05:06 +08:00
Vladislav Khmelevsky	5b59540661	[BOLT] Enhance fixed indirect branch handling (#71324 ) Previously HasFixedIndirectBranch was set in BF to set isSimple to false later because of unreachable bb ellimination pass which might remove the BB with it's symbols accessed by other instructions than calls. It seems to be that better solution would be to add extra entry point on target offset instead of marking BF as non-simple.	2023-11-16 09:30:55 +04:00
Vladislav Khmelevsky	c5a306f07e	[BOLT] Fix LSDA section handling (#71821 ) Currently BOLT finds LSDA secition by it's name .gcc_except_table.main . But sometimes it might have suffix e.g. .gcc_except_table.main. Find LSDA section by it's address, rather by it's name. Fixes #71804	2023-11-15 23:21:50 +04:00
Maksim Panchenko	e823136d43	[BOLT] Refactor --keep-nops option. NFC. (#72228 ) Run RemoveNops pass only if --keep-nops is set to false (default).	2023-11-14 11:28:13 -08:00
Maksim Panchenko	f633f325a1	[BOLT] Fix NOP instruction emission on x86 (#72186 ) Use MCAsmBackend::writeNopData() interface to emit NOP instructions on x86. There are multiple forms of NOP instruction on x86 with different sizes. Currently, LLVM's assembly/disassembly does not support all forms correctly which can lead to a breakage of input code semantics, e.g. if the program relies on NOP instructions for reserving a patch space. Add "--keep-nops" option to preserve NOP instructions.	2023-11-13 18:12:39 -08:00
Maksim Panchenko	2db9b6a93f	[BOLT] Make instruction size a first-class annotation (#72167 ) When NOP instructions are used to reserve space in the code, e.g. for patching, it becomes critical to preserve their original size while emitting the code. On x86, we rely on "Size" annotation for NOP instructions size, as the original instruction size is lost in the disassembly/assembly process. This change makes instruction size a first-class annotation and is affectively NFCI. A follow-up diff will use the annotation for code emission.	2023-11-13 14:33:39 -08:00
Maksim Panchenko	ec4a03c658	[BOLT] Enhance LowerAnnotations pass. NFCI. (#71847 ) After #70147, all primary annotation types are stored directly in the instruction and hence there's no need for the temporary storage we've used previously for repopulating preserved annotations.	2023-11-12 19:34:42 -08:00
Alexander Yermolovich	ce17c6d3ba	[BOLT][DWARF] Fix --dwarf-output-path (#71886 ) Fixed a bug where when --dwarf-output-path is specified and DW_AT_dwo_name contains part of the path the output path would contain both. Which lead to llvm-bolt crash, because the path didn't exist. Example: llvm-bolt .... --dwarf-output-path=/some/path/ DW_AT_dwo_name ("objects/o1/split.dwo") It would try to write .dwo file to /some/path/objects/o1/split.dwo.dwo instead of to /some/path/split.dwo.dwo	2023-11-10 13:18:57 -08:00
Vladislav Khmelevsky	6206817380	[BOLT][AArch64] Fix ADR relaxation (#71835 ) Currently we have an optimization that if the ADR points to the same function we might skip it's relaxation. But it doesn't take into account that BF might be split, in such situation we still need to relax it. And just in case also relax if the initial BF size is >= 1MB. Fixes #71822	2023-11-10 11:48:03 +04:00
Vladislav Khmelevsky	cf18f142c0	[BOLT] Read .rela.dyn in static non-pie binary (#71635 ) Static non-pie binary doesn't have DYNAMIC segment and BOLT skips reading .rela.dyn section because of it. But such binaries might have this section for example to store IFUNC relocation which is resolved by linked-in startup files, so force reading this section for static executables.	2023-11-10 11:47:12 +04:00
Vladislav Khmelevsky	abec50cb93	[BOLT][AArch64] Fix strict usage during ADR Relax (#71377 ) Currently strict mode is used to expand number of optimized functions, not to shrink it. Revert the option usage in the pass, so passing strict option would relax adr instruction even if there are no nops around it. Also add check for nop after adr instruction.	2023-11-10 11:46:36 +04:00
Vladislav Khmelevsky	c6c04a83a7	[BOLT] Run EliminateUnreachableBlocks in parallel (#71299 ) The wall time for this pass decreased on my laptop from ~80 sec to 5 sec processing the clang.	2023-11-10 00:46:04 +04:00
spaette	1a2f83366b	[BOLT] Fix typos (#68121 ) Closes https://github.com/llvm/llvm-project/issues/63097 Before merging please make sure the change to bolt/include/bolt/Passes/StokeInfo.h is correct. bolt/include/bolt/Passes/StokeInfo.h ```diff // This Pass solves the two major problems to use the Stoke program without - // proting its code: + // probing its code: ``` I'm still not happy about the awkward wording in this comment. bolt/include/bolt/Passes/FixRelaxationPass.h ``` $ ed -s bolt/include/bolt/Passes/FixRelaxationPass.h <<<'9,12p' // This file declares the FixRelaxations class, which locates instructions with // wrong targets and fixes them. Such problems usually occures when linker // relaxes (changes) instructions, but doesn't fix relocations types properly // for them. $ ``` bolt/docs/doxygen.cfg.in bolt/include/bolt/Core/BinaryContext.h bolt/include/bolt/Core/BinaryFunction.h bolt/include/bolt/Core/BinarySection.h bolt/include/bolt/Core/DebugData.h bolt/include/bolt/Core/DynoStats.h bolt/include/bolt/Core/Exceptions.h bolt/include/bolt/Core/MCPlusBuilder.h bolt/include/bolt/Core/Relocation.h bolt/include/bolt/Passes/FixRelaxationPass.h bolt/include/bolt/Passes/InstrumentationSummary.h bolt/include/bolt/Passes/ReorderAlgorithm.h bolt/include/bolt/Passes/StackReachingUses.h bolt/include/bolt/Passes/StokeInfo.h bolt/include/bolt/Passes/TailDuplication.h bolt/include/bolt/Profile/DataAggregator.h bolt/include/bolt/Profile/DataReader.h bolt/lib/Core/BinaryContext.cpp bolt/lib/Core/BinarySection.cpp bolt/lib/Core/DebugData.cpp bolt/lib/Core/DynoStats.cpp bolt/lib/Core/Relocation.cpp bolt/lib/Passes/Instrumentation.cpp bolt/lib/Passes/JTFootprintReduction.cpp bolt/lib/Passes/ReorderData.cpp bolt/lib/Passes/RetpolineInsertion.cpp bolt/lib/Passes/ShrinkWrapping.cpp bolt/lib/Passes/TailDuplication.cpp bolt/lib/Rewrite/BoltDiff.cpp bolt/lib/Rewrite/DWARFRewriter.cpp bolt/lib/Rewrite/RewriteInstance.cpp bolt/lib/Utils/CommandLineOpts.cpp bolt/runtime/instr.cpp bolt/test/AArch64/got-ld64-relaxation.test bolt/test/AArch64/unmarked-data.test bolt/test/X86/Inputs/dwarf5-cu-no-debug-addr-helper.s bolt/test/X86/Inputs/linenumber.cpp bolt/test/X86/double-jump.test bolt/test/X86/dwarf5-call-pc-function-null-check.test bolt/test/X86/dwarf5-split-dwarf4-monolithic.test bolt/test/X86/dynrelocs.s bolt/test/X86/fallthrough-to-noop.test bolt/test/X86/tail-duplication-cache.s bolt/test/runtime/X86/instrumentation-ind-calls.s	2023-11-09 11:29:46 -08:00
Maksim Panchenko	11f52f783a	[BOLT][DWARF] Fix invalid address ranges (#71474 ) When NOP instructions are removed by BOLT and a DWARF address range falls past the removed instructions, it may lead to invalid DWARF ranges in the output binary. E.g. the range may fall outside of the basic block boundaries. This fix makes sure the modified range fits within the containing basic block. A proper fix requires tracking instructions within the block and will come in a different PR.	2023-11-09 09:55:49 -08:00
Maksim Panchenko	254ccb95e8	[BOLT] Follow-up to "Fix incorrect basic block output addresses" (#71630 ) In `8244ff6739`, I've introduced an assertion that incorrectly used BasicBlock::empty(). Some basic blocks may contain only pseudo instructions and thus BB->empty() will evaluate to false, while the actual code size will be zero.	2023-11-08 10:53:36 -08:00
Job Noorman	96b5e092dc	[BOLT] Support instrumentation hook via DT_FINI_ARRAY (#67348 ) BOLT currently hooks its its instrumentation finalization function via `DT_FINI`. However, this method of calling finalization routines is not supported anymore on newer ABIs like RISC-V. `DT_FINI_ARRAY` is preferred there. This patch adds support for hooking into `DT_FINI_ARRAY` instead if the binary does not have a `DT_FINI` entry. If it does, `DT_FINI` takes precedence so this patch should not change how the currently supported instrumentation targets behave. `DT_FINI_ARRAY` points to an array in memory of `DT_FINI_ARRAYSZ` bytes. It consists of pointer-length entries that contain the addresses of finalization functions. However, the addresses are only filled-in by the dynamic linker at load time using relative relocations. This makes hooking via `DT_FINI_ARRAY` a bit more complicated than via `DT_FINI`. The implementation works as follows: - While scanning the binary: find the section where `DT_FINI_ARRAY` points to, read its first dynamic relocation and use its addend to find the address of the fini function we will use to hook; - While writing the output file: overwrite the addend of the dynamic relocation with the address of the runtime library's fini function. Updating the dynamic relocation required a bit of boiler plate: since dynamic relocations are stored in a `std::multiset` which doesn't support getting mutable references to its items, functions were added to `BinarySection` to take an existing relocation and insert a new one.	2023-11-08 11:01:10 +00:00
Vladislav Khmelevsky	e2f1a95f2a	[BOLT][AArch64] Handle IFUNCS properly (#71104 ) Currently we were testing only the binaries compiled with O0, which results in indirect call to the IFUNC trampoline and the trampoline has associated IFUNC symbol with it. Compile with O3 results in direct calling the IFUNC trampoline and no symbols are associated with it, the IFUNC symbol address becomes the same as IFUNC resolver address. Since no symbol was associated the BF was not created before PLT analyze and be the algorithm we're going to analyze target relocation. As we're expecting the JUMP relocation we're also expecting the associated symbol with it to be presented. But for IFUNC relocation the IRELATIVE relocation is used and no symbol is associated with it, the addend value is pointing on the target symbol, so we need to find BF using it and use it's symbol in this situation. Currently this is checked only for AArch64 platform, so I've limited it in code to use this logic only for this platform, although I wouldn't be surprised if other platforms needs to activate this logic too.	2023-11-08 11:41:43 +04:00
Vladislav Khmelevsky	485075c095	[BOLT][AArch64] Don't change layout in PatchEntries (#71278 ) Due to LongJmp pass that is executed before PatchEntries we can't ignore the function here since it would change pre-calculated output layout. The test reloc-26 relied on the wrong behavior, rewritten to unittest. This is also attemp to fix #70771	2023-11-08 11:38:46 +04:00
Vladislav Khmelevsky	2e6f722b88	[BOLT] Move instrumentation option check (NFC) (#71581 ) Move options check from emitBinary to more proper adjustCommandLineOptions.	2023-11-08 01:54:50 +04:00
Vladislav Khmelevsky	66432943e9	[BOLT] Fix typo (NFC) (#71579 ) instrumentation->hugify	2023-11-08 01:54:31 +04:00

1 2 3 4 5 ...

696 Commits