clang-p2996

Author	SHA1	Message	Date
sinan	c3bbc3a57d	[BOLT] Fix logs with no hex convension (#112650 ) Add `utohexstr` to ensure that offsets/addresses are correctly formatted as hexadecimal values.	2024-10-18 09:46:41 +08:00
ShatianWang	4cab01f072	[BOLT] Profile quality stats -- CFG discontinuity (#109683 ) In a perfect profile, each positive-execution-count block in the function’s CFG should be reachable from a positive-execution-count function entry block through a positive-execution-count path. This new pass checks how well the BOLT input profile satisfies this “CFG continuity” property. More specifically, for each of the hottest 1000 functions, the pass calculates the function’s fraction of basic block execution counts that is “unreachable”. It then reports the 95th percentile of the distribution of the 1000 unreachable fractions in a single BOLT-INFO line. The smaller the reported value is, the better the BOLT profile satisfies the CFG continuity property. The default value of 1000 above can be changed via the hidden BOLT option `-num-functions-for-continuity-check=[N]`. If more detailed stats are needed, `-v=1` can be added to the BOLT invocation: the hottest N functions will be grouped into 5 equally-sized buckets, from the hottest to the coldest; for each bucket, various summary statistics of the distribution of the fractions and the raw unreachable execution counts will be reported.	2024-10-08 19:07:43 -04:00
Youngsuk Kim	0a5edb4de4	[bolt] Don't call llvm::raw_string_ostream::flush() (NFC) Don't call raw_string_ostream::flush(), which is essentially a no-op. As specified in the docs, raw_string_ostream is always unbuffered. ( `65b13610a5` for further reference )	2024-09-23 17:07:11 -05:00
Kristof Beyls	6d216fb7b8	[perf2bolt] Improve heuristic to map in-process addresses to specific… (#109397 ) … segments in Elf binary. The heuristic is improved by also taking into account that only executable segments should contain instructions. Fixes #109384.	2024-09-23 15:14:51 +02:00
sinan	31ac3d092b	[BOLT] Add .iplt support to x86 (#106513 ) Add X86 support for parsing .iplt section and symbols.	2024-09-23 18:22:43 +08:00
Amir Ayupov	86ec59e2f7	[BOLT] Only parse probes for profiled functions in profile-write-pseudo-probes mode (#106365 ) Implement selective probe parsing for profiled functions only when emitting probe information to YAML profile as suggested in https://github.com/llvm/llvm-project/pull/102904#pullrequestreview-2248714190 For a large binary, this reduces probe parsing - processing time from 10.5925s to 5.6295s, - peak RSS from 10.54 to 7.98 GiB.	2024-09-11 16:33:34 -07:00
Amir Ayupov	c820bd3e33	[BOLT][NFC] Rename profile-use-pseudo-probes The flag currently controls writing of probe information in YAML profile. #99891 adds a separate flag to use probe information for stale profile matching. Thus `profile-use-pseudo-probes` becomes a misnomer and `profile-write-pseudo-probes` better captures the intent. Reviewers: maksfb, WenleiHe, ayermolo, rafaelauler, dcci Reviewed By: rafaelauler Pull Request: https://github.com/llvm/llvm-project/pull/106364	2024-09-11 16:27:33 -07:00
Amir Ayupov	a66ce58ac6	[BOLT] Drop suffixes in parsePseudoProbe GUID assignment (#106243 ) Pseudo probe function records contain GUIDs assigned by the compiler using an IR function name. Thus suffixes added later (e.g. `.llvm.` for internal symbols, `.destroy`/`.resume` for coroutine fragments, and `.cold`/`.warm` for split fragments) cause GUID mismatch. Address that by dropping those suffixes using `getCommonName` which is a parametrized form of `getLTOCommonName`.	2024-09-11 14:42:51 -07:00
Amir Ayupov	a79cf0228e	[MC][NFC] Use vector for GUIDProbeFunctionMap Replace unordered_map with a vector. Pre-parse the section to statically allocate storage. Use BumpPtrAllocator for FuncName strings, keep StringRef in FuncDesc. Reduces peak RSS of pseudo probe parsing from 9.08 GiB to 8.89 GiB as part of perf2bolt with a large binary. Test Plan: ``` bin/llvm-lit -sv test/tools/llvm-profgen ``` Reviewers: wlei-llvm, rafaelauler, dcci, maksfb, ayermolo Reviewed By: wlei-llvm Pull Request: https://github.com/llvm/llvm-project/pull/102905	2024-08-26 09:15:53 -07:00
Amir Ayupov	ee09f7d1fc	[MC][NFC] Reduce Address2ProbesMap size Replace the map from addresses to list of probes with a flat vector containing probe references sorted by their addresses. Reduces pseudo probe parsing time from 9.56s to 8.59s and peak RSS from 9.66 GiB to 9.08 GiB as part of perf2bolt processing a large binary. Test Plan: ``` bin/llvm-lit -sv test/tools/llvm-profgen ``` Reviewers: maksfb, rafaelauler, dcci, ayermolo, wlei-llvm Reviewed By: wlei-llvm Pull Request: https://github.com/llvm/llvm-project/pull/102904	2024-08-26 09:14:35 -07:00
Amir Ayupov	04ebd1907c	[MC][NFC] Statically allocate storage for decoded pseudo probes and function records Use #102774 to allocate storage for decoded probes (`PseudoProbeVec`) and function records (`InlineTreeVec`). Leverage that to also shrink sizes of `MCDecodedPseudoProbe`: - Drop Guid since it's accessible via `InlineTree`. `MCDecodedPseudoProbeInlineTree`: - Keep track of probes and inlinees using `ArrayRef`s now that probes and function records belonging to the same function are allocated contiguously. This reduces peak RSS from 13.7 GiB to 9.7 GiB and pseudo probe parsing time (as part of perf2bolt) from 15.3s to 9.6s for a large binary with 400MiB .pseudo_probe section containing 43M probes and 25M function records. Depends on: #102774 #102787 #102788 Reviewers: maksfb, rafaelauler, dcci, ayermolo, wlei-llvm Reviewed By: wlei-llvm Pull Request: https://github.com/llvm/llvm-project/pull/102789	2024-08-26 09:09:13 -07:00
Amir Ayupov	121ed07975	[MC][NFC] Count pseudo probes and function records Pre-parse pseudo probes section counting the number of probes and function records. These numbers are used in follow-up diff to pre-allocate vectors for decoded probes and inline tree nodes. Additional benefit is avoiding error handling during parsing. This pre-parsing is fast: for a 404MiB .pseudo_probe section with 43373881 probes and 25228770 function records, it only takes 0.68±0.01s. The total time of buildAddress2ProbeMap is 21s. Reviewers: dcci, maksfb, rafaelauler, wlei-llvm, ayermolo Reviewed By: wlei-llvm Pull Request: https://github.com/llvm/llvm-project/pull/102774	2024-08-26 09:05:34 -07:00
Sayhaan Siddiqui	6aad62cf5b	[BOLT][DWARF] Add parallelization for processing of DWO debug information (#100282 ) Enables parallelization for the processing of DWO CUs.	2024-08-08 16:41:51 -07:00
Davide Italiano	e49549ff19	Revert "[BOLT] Abort on out-of-section symbols in GOT (#100801 )" This reverts commit `a4900f0d93`.	2024-08-07 20:52:19 -07:00
Vladislav Khmelevsky	445023f173	Revert "[BOLT] Move ADRRelaxationPass (#101371 )" (#102333 ) This reverts commit `750b12f06b`. The pass should run after splitting phase, but before nop removal	2024-08-07 21:03:51 +04:00
Sayhaan Siddiqui	62e894e0d7	[BOLT][DWARF][NFC] Move Arch assignment out of createBinaryContext (#102054 ) Moves the assignment of Arch out of createBinaryContext to prevent data races when parallelized.	2024-08-07 16:55:39 +00:00
Vladislav Khmelevsky	a4900f0d93	[BOLT] Abort on out-of-section symbols in GOT (#100801 ) This patch aborts BOLT execution if it finds out-of-section (section end) symbol in GOT table. In order to handle such situations properly in future, we would need to have an arch-dependent way to analyze relocations or its sequences, e.g., for ARM it would probably be ADRP + LDR analysis in order to get GOT entry address. Currently, it is also challenging because GOT-related relocation symbols are replaced to __BOLT_got_zero. Anyway, it seems to be quite a rare case, which seems to be only? related to static binaries. For the most part, it seems that it should be handled on the linker stage, since static binary should not have GOT table at all. LLD linker with relaxations enabled would replace instruction addresses from GOT directly to target symbols, which eliminates the problem. Anyway, in order to achieve detection of such cases, this patch fixes a few things in BOLT: 1. For the end symbols, we're now using the section provided by ELF binary. Previously it would be tied with a wrong section found by symbol address. 2. The end symbols would have limited registration we would only add them in name->data GlobalSymbols map, since using address->data BinaryDataMap map would likely be impossible due to address duality of such symbols. 3. The outdated BD->getSection (currently returning refence, not pointer) check in postProcessSymbolTable is replaced by getSize check in order to allow zero-sized top-level symbols if they are located in zero-sized sections. For the most part, such things could only be found in tests, but I don't see a reason not to handle such cases. 4. Updated section-end-sym test and removed x86_64 requirement since there is no reason for this (tested on aarch64 linux) The test was provided by peterwaller-arm (thank you) in #100096 and slightly modified by me.	2024-08-07 16:26:12 +04:00
Vladislav Khmelevsky	097ddd3565	[BOLT] Fix relocations handling (#100890 ) After porting BOLT to RISCV some of the relocations were broken on both AArch64 and X86. On AArch64 the example of broken relocations would be GOT, during handling them, we should replace the symbol to __BOLT_got_zero in order to address GOT entry, not the symbol that addresses this entry. This is done further in code, so it is too early to add rel here. On X86 it is a mistake to add relocations without addend. This is the exact problem that is raised on #97937. Due to different code generation I had to use gcc-generated yaml test, since with clang I wasn't able to reproduce problem. Added tests for both architectures and made the problematic condition riscV-specific.	2024-08-07 16:25:46 +04:00
Vladislav Khmelevsky	750b12f06b	[BOLT] Move ADRRelaxationPass (#101371 ) For non-simple functions we need nop instruction to be presented to transform ADR to ADRP+ADD sequence, so run this pass before remove nops pass.	2024-08-07 16:23:38 +04:00
sinan	6c8933e1a0	[BOLT] Skip PLT search for zero-value weak reference symbols (#69136 ) Take a common weak reference pattern for example ``` __attribute__((weak)) void undef_weak_fun(); if (&undef_weak_fun) undef_weak_fun(); ``` In this case, an undefined weak symbol `undef_weak_fun` has an address of zero, and Bolt incorrectly changes the relocation for the corresponding symbol to symbol@PLT, leading to incorrect runtime behavior.	2024-08-07 18:02:42 +08:00
sinan	734c0488b6	[BOLT] Support map other function entry address (#101466 ) Allow BOLT to map the old address to a new binary address if the old address is the entry of the function.	2024-08-07 15:57:25 +08:00
Amir Ayupov	3f51bec466	[BOLT][NFC] Print timers in perf2bolt invocation When BOLT is run in AggregateOnly mode (perf2bolt), it exits with code zero so destructors are not run thus TimerGroup never prints the timers. Add explicit printing just before the exit to honor options requesting timers (`--time-rewrite`, `--time-aggr`). Test Plan: updated bolt/test/timers.c Reviewers: ayermolo, maksfb, rafaelauler, dcci Reviewed By: dcci Pull Request: https://github.com/llvm/llvm-project/pull/101270	2024-07-31 22:14:52 -07:00
Amir Ayupov	fb97b4f962	[BOLT][NFC] Add timers for MetadataManager invocations Test Plan: added bolt/test/timers.c Reviewers: ayermolo, maksfb, rafaelauler, dcci Reviewed By: dcci Pull Request: https://github.com/llvm/llvm-project/pull/101267	2024-07-31 22:12:34 -07:00
Sayhaan Siddiqui	910012e7c5	[BOLT][DWARF][NFC] Split DIEBuilder::finish (#101244 ) Split DIEBuilder::finish so that code updating .debug_names is in a separate function.	2024-07-31 13:41:38 -07:00
Sayhaan Siddiqui	79dcd93b70	[BOLT][DWARF] Remove option to write to DWP (#100771 ) Remove the --write-dwp option as well as related code and tests.	2024-07-30 16:58:01 -07:00
Sayhaan Siddiqui	9a3e66e314	[BOLT][DWARF][NFC] Fix DebugStrOffsetsWriter (#100672 ) Fix DebugStrOffsetsWriter so updateAddressMap can't be called after it is finalized.	2024-07-26 18:58:25 -07:00
Sayhaan Siddiqui	b33ef5bd68	[BOLT][DWARF][NFC] Add mc opt to DWARFRewriter.cpp (#100800 ) Running into an error with removing DWP where the assertion `RelaxAllView && "RegisterMCTargetOptionsFlags not created."'` failed. This is a result of DWP bringing the mc::RegisterMCTargetOptionsFlags option in, and the option being removed with DWP. The need for this option didn't originally exist because we didn't use MC in DWARFRewriter, but we switched to using DWARFStreamer which needed the option. https://reviews.llvm.org/D75579 https://reviews.llvm.org/D106417	2024-07-26 14:09:46 -07:00
Amir Ayupov	4d19676de4	[BOLT] Add profile-use-pseudo-probes option Move pseudo probe profile generation under --profile-use-pseudo-probes option. Note that updating pseudo probes is independent from this flag. Test Plan: updated pseudoprobe-decoding-inline.test Reviewers: maksfb, rafaelauler, ayermolo, dcci, WenleiHe Reviewed By: WenleiHe Pull Request: https://github.com/llvm/llvm-project/pull/100299	2024-07-24 07:31:01 -07:00
Sayhaan Siddiqui	ea4a348098	[BOLT][DWARF][NFC] Move initialization of DWOName outside of lambda (#99728 ) Followup to the splitting of processUnitDIE, moves code that accesses common resource to be outside of the function that will be parallelized. Followup to #99957	2024-07-23 17:30:54 -07:00
Sayhaan Siddiqui	7cd7a1eab4	[BOLT][DWARF][NFC] Split processUnitDIE into two lambdas (#99957 ) Split processUnitDIE into two lambdas to separate the processing of DWO CUs and CUs in the main binary.	2024-07-23 12:59:40 -07:00
Sayhaan Siddiqui	bdee9b05de	Revert "[BOLT][DWARF][NFC] Split processUnitDIE into two lambdas" (#99904 ) Reverts llvm/llvm-project#99225	2024-07-22 12:31:51 -07:00
Sayhaan Siddiqui	6747f12931	[BOLT][DWARF][NFC] Split processUnitDIE into two lambdas (#99225 ) Split processUnitDIE into two lambdas to separate the processing of DWO CUs and CUs in the main binary.	2024-07-19 17:52:49 -07:00
Daniel Hill	b686600a57	[BOLT] Skip instruction shortening (#93032 ) Add the ability to disable the instruction shortening pass through --shorten-instructions=false	2024-07-19 16:52:01 -07:00
Sayhaan Siddiqui	d54ec64f67	[BOLT][DWARF] Remove deprecated opt (#99575 ) Remove deprecated DeterministicDebugInfo option and its uses.	2024-07-19 14:03:50 -07:00
Amir Ayupov	9b007a199d	[BOLT] Expose pseudo probe function checksum and GUID (#99389 ) Add a BinaryFunction field for pseudo probe function GUID. Populate it during pseudo probe section parsing, and emit it in YAML profile (both regular and BAT), along with function checksum. To be used for stale function matching. Test Plan: update pseudoprobe-decoding-inline.test	2024-07-18 20:58:16 -07:00
Sayhaan Siddiqui	c0c157a518	[BOLT][DWARF][NFC] Remove DWO ranges base (#99284 ) Removes getters and setters for DWO ranges base due to it not being used.	2024-07-18 09:24:46 -07:00
Vladislav Khmelevsky	51122fb446	[BOLT][NFC] Fix build (#99361 ) On clang 14 the build is failing with: reference to local binding 'ParentName' declared in enclosing function 'llvm::bolt::RewriteInstance::registerFragments'	2024-07-17 23:17:12 +04:00
Amir Ayupov	3fe50b6dde	[BOLT] Store FileSymRefs in a multimap With aggressive ICF, it's possible to have different local symbols (under different FILE symbols) to be mapped to the same address. FileSymRefs only keeps a single SymbolRef per address, which prevents fragment matching from finding the correct symbol to perform parent function lookup. Work around this issue by switching FileSymRefs to a multimap. In future, uses of FileSymRefs can be replaced with SortedSymbols which keeps essentially the same information. Test Plan: added ambiguous_fragment.test Reviewers: dcci, ayermolo, maksfb, rafaelauler Reviewed By: rafaelauler Pull Request: https://github.com/llvm/llvm-project/pull/98992	2024-07-16 22:14:43 -07:00
Sayhaan Siddiqui	e140a8a3c8	[BOLT][DWARF][NFC] Refactor address writers (#98094 ) Refactors address writers to create an instance for each CU and its DWO CU.	2024-07-15 23:03:43 -07:00
Sayhaan Siddiqui	7e10ad99ad	[BOLT][DWARF] Cleanup buffer initialization for DWO range writer (#97843 ) Cleanup buffer initialization for DWO range writer instances to remove empty buffer at the beginning.	2024-07-10 11:35:40 -07:00
Sayhaan Siddiqui	a972b2e9a4	[BOLT][DWARF][NFC] Cleanup RangesBase check (#97840 ) Moves check for RangesBase under check for UnitDie. This makes the flow clearer because we add RangesBase when it is a UnitDie.	2024-07-10 10:53:08 -07:00
Sayhaan Siddiqui	d283627c4a	[BOLT][DWARF][NFC] Update Die to not use std::optional (#97844 ) Updates initialization to remove unnecessary use of std::optional.	2024-07-09 16:37:09 -07:00
Sayhaan Siddiqui	a40daa34ef	[BOLT][DWARF][NFC] Cleanup version check (#97839 ) Cleans up version check to remove redundant else branch.	2024-07-09 16:36:26 -07:00
Sayhaan Siddiqui	5828b04b03	[BOLT][DWARF] Refactor legacy ranges writers (#96006 ) Refactors legacy ranges writers to create a writer for each instance of a DWO file. We now write out everything into .debug_ranges after the all the DWO files are processed. This also changes the order that ranges is written out in, as before we wrote out while in the main CU processing loop and we now iterate through the CU buckets created by partitionCUs, after the main processing loop.	2024-07-03 14:50:40 -07:00
Amir Ayupov	344228ebf4	[BOLT] Drop macro-fusion alignment (#97358 ) `9d0754ada5` dropped MC support required for optimal macro-fusion alignment in BOLT. Remove the support in BOLT as performance measurements with large binaries didn't show a significant improvement. Test Plan: macro-fusion alignment was never upstreamed, so no upstream tests are affected.	2024-07-02 09:20:41 -07:00
Fangrui Song	e3e0df391c	[BOLT] Replace the MCAsmLayout parameter with MCAssembler Continue the MCAsmLayout removal work started by `67957a45ee`.	2024-07-01 18:02:34 -07:00
Fangrui Song	dbf12b2f77	[MC] Remove MCAsmLayout::{getSymbolOffset,getBaseSymbol} The MCAsmLayout::* forwarders added by `67957a45ee` have all been removed.	2024-07-01 11:51:26 -07:00
Shaw Young	49fdbbcfed	[BOLT] Match functions with exact hash (#96572 ) Added flag '--match-profile-with-function-hash' to match functions based on exact hash. After identical and LTO name matching, more functions can be recovered for inference with exact hash, in the case of function renaming with no functional changes. Collisions are possible in the unlikely case where multiple functions share the same exact hash. The flag is off by default as it requires the processing of all binary functions and subsequently is expensive. Test Plan: added hashing-based-function-matching.test.	2024-06-29 21:19:00 -07:00
Maksim Panchenko	d16b21b17d	[BOLT][Linux] Support ORC for alternative instructions (#96709 ) Alternative instruction sequences in the Linux kernel can modify the stack and thus they need their own ORC unwind entries. Since there's only one ORC table, it has to be "shared" among multiple instruction sequences. The kernel achieves this by putting a restriction on instruction boundaries. If ORC state changes at a given IP, only one of the alternative sequences can have an instruction starting/ending at this IP. Then, developers can insert NOPs to guarantee the above requirement is met. The most common use of ORC with alternatives is "pushf; pop %rax" sequence used for paravirtualization. Note that newer kernel versions no longer use .parainstructions; instead, they utilize alternatives for the same purpose. Before we implement a better support for alternatives, we can safely skip ORC entries associated with them. Fixes #87052.	2024-06-27 19:26:11 -07:00
shawbyoung	902952ae04	Revert "[𝘀𝗽𝗿] initial version" This reverts commit `bb5ab1ffe7`.	2024-06-25 08:30:29 -07:00

1 2 3 4 5 ...

425 Commits