clang-p2996

Author	SHA1	Message	Date
Paschalis Mpeis	51003076eb	Reapply [BOLT] DataAggregator support for binaries with multiple text segments (#118023 ) When a binary has multiple text segments, the Size is computed as the difference of the last address of these segments from the BaseAddress. The base addresses of all text segments must be the same. Introduces flag 'perf-script-events' for testing, which allows passing perf events without BOLT having to parse them by invoking 'perf script'. The flag is used to pass a mock perf profile that has two memory mappings for a mock binary that has two text segments. The mapping size is updated as `parseMMapEvents` now processes all text segments.	2024-12-02 09:20:40 +00:00
Hans Wennborg	537343dea4	Revert "[BOLT] DataAggregator support for binaries with multiple text segments (#92815 )" This caused test failures, see comment on the PR: Failed Tests (2): BOLT-Unit :: Core/./CoreTests/AArch64/MemoryMapsTester/MultipleSegmentsMismatchedBaseAddress/0 BOLT-Unit :: Core/./CoreTests/X86/MemoryMapsTester/MultipleSegmentsMismatchedBaseAddress/0 > When a binary has multiple text segments, the Size is computed as the > difference of the last address of these segments from the BaseAddress. > The base addresses of all text segments must be the same. > > Introduces flag 'perf-script-events' for testing. It allows passing perf events > without BOLT having to parse them using 'perf script'. The flag is used to > pass a mock perf profile that has two memory mappings for a mock binary > that has two text segments. The size of the mapping is updated as this > change `parseMMapEvents` processes all text segments. This reverts commit `4b71b3782d`.	2024-11-26 14:59:30 +01:00
Paschalis Mpeis	4b71b3782d	[BOLT] DataAggregator support for binaries with multiple text segments (#92815 ) When a binary has multiple text segments, the Size is computed as the difference of the last address of these segments from the BaseAddress. The base addresses of all text segments must be the same. Introduces flag 'perf-script-events' for testing. It allows passing perf events without BOLT having to parse them using 'perf script'. The flag is used to pass a mock perf profile that has two memory mappings for a mock binary that has two text segments. The size of the mapping is updated as this change `parseMMapEvents` processes all text segments.	2024-11-25 13:12:43 +00:00
Amir Ayupov	74e6478f81	[BOLT] Set call to continuation count in pre-aggregated profile #109683 identified an issue with pre-aggregated profile where a call to continuation fallthrough edge count is missing (profile discontinuity). This issue only affects pre-aggregated profile but not perf data since LBR stack has the necessary information to determine if the trace (fall- through) starts at call continuation, whereas pre-aggregated fallthrough lacks this information. The solution is to look at branch records in pre-aggregated profiles that correspond to returns and assign counts to call to continuation fallthrough: - BranchFrom is in another function or DSO, - BranchTo may be a call continuation site: - not an entry point/landing pad. Note that we can't directly check if BranchFrom corresponds to a return instruction if it's in external DSO. Keep call continuation handling for perf data (`getFallthroughsInTrace`) [1] as-is due to marginally better performance. The difference is that return-converted call to continuation fallthrough is slightly more frequent than other fallthroughs since the former only requires one LBR address while the latter need two that belong to the profiled binary. Hence return-converted fallthroughs have larger "weight" which affects code layout. [1] `DataAggregator::getFallthroughsInTrace` `fea18afeed/bolt/lib/Profile/DataAggregator.cpp (L906-L915)` Test Plan: added callcont-fallthru.s Reviewers: maksfb, ayermolo, ShatianWang, dcci Reviewed By: maksfb, ShatianWang Pull Request: https://github.com/llvm/llvm-project/pull/109486	2024-11-07 16:20:19 -08:00
Amir Ayupov	6ee5ff95ab	[BOLT] Add profile density computation Reuse the definition of profile density from llvm-profgen (#92144): - the density is computed in perf2bolt using raw samples (perf.data or pre-aggregated data), - function density is the ratio of dynamically executed function bytes to the static function size in bytes, - profile density: - functions are sorted by density in decreasing order, accumulating their respective sample counts, - profile density is the smallest density covering 99% of total sample count. In other words, BOLT binary profile density is the minimum amount of profile information per function (excluding functions in tail 1% sample count) which is sufficient to optimize the binary well. The density threshold of 60 was determined through experiments with large binaries by reducing the sample count and checking resulting profile density and performance. The threshold is conservative. perf2bolt would print the warning if the density is below the threshold and suggest to increase the sampling duration and/or frequency to reach a given density, e.g.: ``` BOLT-WARNING: BOLT is estimated to optimize better with 2.8x more samples. ``` Test Plan: updated pre-aggregated-perf.test Reviewers: maksfb, wlei-llvm, rafaelauler, ayermolo, dcci, WenleiHe Reviewed By: WenleiHe, wlei-llvm Pull Request: https://github.com/llvm/llvm-project/pull/101094	2024-10-24 18:30:59 -07:00
Amir Ayupov	08916cef7e	[BOLT] Set RawBranchCount in DataAggregator Align DataAggregator (Linux perf and pre-aggregated profile reader) to DataReader (fdata profile reader) behavior: set BF->RawBranchCount which is used in profile density computation (#101094). Reviewers: ayermolo, maksfb, dcci, rafaelauler, WenleiHe Reviewed By: WenleiHe Pull Request: https://github.com/llvm/llvm-project/pull/101093	2024-10-24 18:28:44 -07:00
Kristof Beyls	6d216fb7b8	[perf2bolt] Improve heuristic to map in-process addresses to specific… (#109397 ) … segments in Elf binary. The heuristic is improved by also taking into account that only executable segments should contain instructions. Fixes #109384.	2024-09-23 15:14:51 +02:00
Amir Ayupov	c00c62c113	[BOLT] Add pseudo probe inline tree to YAML profile Add probe inline tree information to YAML profile, at function level: - function GUID, - checksum, - parent node id, - call site in the parent. This information is used for pseudo probe block matching (#99891). The encoding adds/changes probe information in multiple levels of YAML profile: - BinaryProfile: add pseudo_probe_desc with GUIDs and Hashes, which permits deduplication of data: - many GUIDs are duplicate as the same callee is commonly inlined into multiple callers, - hashes are also very repetitive, especially for functions with low block counts. - FunctionProfile: add inline tree (see above). Top-level function is included as root of function inline tree, which makes guid and pseudo_probe_desc_hash fields redundant. - BlockProfile: densely-encoded block probe information: - probes reference their containing inline tree node, - separate lists for block, call, indirect call probes, - block probe encoding is specialized: ids are encoded as bitset in uint64_t. If only block probe with id=1 is present, it's encoded as implicit entry (id=0, omitted). - inline tree nodes with identical probes share probe description where node indices are combined into a list. On top of #107970, profile with new probe encoding has the following characteristics (profile for a large binary): - Profile without probe information: 33MB, 3.8MB compressed (baseline). - Profile with inline tree information: 92MB, 14MB compressed. Profile processing time (YAML parsing, inference, attaching steps): - profile without pseudo probes: 5s, - profile with pseudo probes, without pseudo probe matching: 11s, - with pseudo probe matching: 12.5s. Test Plan: updated pseudoprobe-decoding-inline.test Reviewers: wlei-llvm, ayermolo, rafaelauler, dcci, maksfb Reviewed By: wlei-llvm, rafaelauler Pull Request: https://github.com/llvm/llvm-project/pull/107137	2024-09-12 20:51:35 -07:00
Amir Ayupov	ccc7a072db	[BOLT] Drop blocks without profile in BAT YAML (#107970 ) Align BAT YAML (DataAggregator) to YAMLProfileWriter which drops blocks without profile: `61372fc5db/bolt/lib/Profile/YAMLProfileWriter.cpp (L162-L176)` Test Plan: NFCI	2024-09-11 16:36:47 -07:00
Amir Ayupov	c820bd3e33	[BOLT][NFC] Rename profile-use-pseudo-probes The flag currently controls writing of probe information in YAML profile. #99891 adds a separate flag to use probe information for stale profile matching. Thus `profile-use-pseudo-probes` becomes a misnomer and `profile-write-pseudo-probes` better captures the intent. Reviewers: maksfb, WenleiHe, ayermolo, rafaelauler, dcci Reviewed By: rafaelauler Pull Request: https://github.com/llvm/llvm-project/pull/106364	2024-09-11 16:27:33 -07:00
Amir Ayupov	ee09f7d1fc	[MC][NFC] Reduce Address2ProbesMap size Replace the map from addresses to list of probes with a flat vector containing probe references sorted by their addresses. Reduces pseudo probe parsing time from 9.56s to 8.59s and peak RSS from 9.66 GiB to 9.08 GiB as part of perf2bolt processing a large binary. Test Plan: ``` bin/llvm-lit -sv test/tools/llvm-profgen ``` Reviewers: maksfb, rafaelauler, dcci, ayermolo, wlei-llvm Reviewed By: wlei-llvm Pull Request: https://github.com/llvm/llvm-project/pull/102904	2024-08-26 09:14:35 -07:00
Amir Ayupov	4d19676de4	[BOLT] Add profile-use-pseudo-probes option Move pseudo probe profile generation under --profile-use-pseudo-probes option. Note that updating pseudo probes is independent from this flag. Test Plan: updated pseudoprobe-decoding-inline.test Reviewers: maksfb, rafaelauler, ayermolo, dcci, WenleiHe Reviewed By: WenleiHe Pull Request: https://github.com/llvm/llvm-project/pull/100299	2024-07-24 07:31:01 -07:00
Amir Ayupov	c905db67a0	[BOLT] Attach pseudo probes to blocks in YAML profile Read pseudo probes in regular and BAT YAML profile generation, and attach them to YAML profile basic blocks. This exposes GUID, probe id, and probe type in profile for future use in stale profile matching. Test Plan: updated pseudoprobe-decoding-inline.test Reviewers: dcci, rafaelauler, ayermolo, maksfb Reviewed By: rafaelauler Pull Request: https://github.com/llvm/llvm-project/pull/99554	2024-07-18 21:01:40 -07:00
Amir Ayupov	9b007a199d	[BOLT] Expose pseudo probe function checksum and GUID (#99389 ) Add a BinaryFunction field for pseudo probe function GUID. Populate it during pseudo probe section parsing, and emit it in YAML profile (both regular and BAT), along with function checksum. To be used for stale function matching. Test Plan: update pseudoprobe-decoding-inline.test	2024-07-18 20:58:16 -07:00
Amir Ayupov	d1d9545ed3	[BOLT][BAT] Add entries for deleted basic blocks Deleted basic blocks are required for correct mapping of branches modified by SCTC. Increases BAT size, bytes: - large binary: 8622496 -> 8703244. - small binary (X86/bolt-address-translation.test): 928 -> 940. Test Plan: updated bb-with-two-tail-calls.s Reviewers: ayermolo, dcci, maksfb, rafaelauler Reviewed By: rafaelauler Pull Request: https://github.com/llvm/llvm-project/pull/91906	2024-05-23 19:19:07 -07:00
Amir Ayupov	465bfd41fa	[BOLT][NFC] Simplify BBHashMapTy (#91812 )	2024-05-22 16:00:51 -07:00
Amir Ayupov	1529ec085a	[BOLT][NFC] Move out PrintProgramStats from Profile into Rewrite (#93075 ) Eliminate the dependence of Profile on Passes. Test Plan: NFC	2024-05-22 13:53:41 -07:00
Amir Ayupov	97025bd9d5	[BOLT] Use getLocationName in YAMLProfileWriter (#92493 ) Disambiguate local functions using the containing file symbol in BAT mode. Make local function naming consistent across BAT fdata and YAML profiles. Test Plan: updated register-fragments-bolt-symbols.s	2024-05-21 20:24:46 -07:00
Amir Ayupov	a9b67490b2	[BOLT] Report adjusted program stats from perf2bolt in BAT mode (#91683 )	2024-05-21 18:54:15 -07:00
Kazu Hirata	1486653dcf	[BOLT] Use StringRef::contains (NFC) (#92842 )	2024-05-20 19:18:45 -07:00
Amir Ayupov	9f15aa009c	[BOLT][NFC] Rename DataAggregator::BranchInfo to TakenBranchInfo Align the name to its counterpart `FTInfo` which avoids name aliasing with llvm::bolt::BranchInfo and allows to drop namespace specifier. Test Plan: NFC Reviewers: maksfb, rafaelauler, ayermolo, dcci Reviewed By: dcci Pull Request: https://github.com/llvm/llvm-project/pull/92017	2024-05-16 20:02:51 -07:00
Amir Ayupov	4ecf2caf68	[BOLT] Use aggregated FuncBranchData in writeBATYAML Switch from FuncBranchData intermediate maps (Intra/InterIndex) to aggregated Data, same as one used by DataReader: `e62ce1f884/bolt/lib/Profile/DataReader.cpp (L385-L389)` This aligns the order of the output between YAMLProfileWriter and writeBATYAML. Test Plan: updated bolt-address-translation-yaml.test Reviewers: rafaelauler, dcci, ayermolo, maksfb Reviewed By: ayermolo, maksfb Pull Request: https://github.com/llvm/llvm-project/pull/91289	2024-05-13 14:23:32 -07:00
Kazu Hirata	f841ca0c35	Use StringRef::operator== instead of StringRef::equals (NFC) (#91864 ) I'm planning to remove StringRef::equals in favor of StringRef::operator==. - StringRef::operator==/!= outnumber StringRef::equals by a factor of 276 under llvm-project/ in terms of their usage. - The elimination of StringRef::equals brings StringRef closer to std::string_view, which has operator== but not equals. - S == "foo" is more readable than S.equals("foo"), especially for !Long.Expression.equals("str") vs Long.Expression != "str".	2024-05-12 23:08:40 -07:00
Amir Ayupov	b5af667b01	[BOLT] Map branch source address to the containing basic block in BAT YAML Fix an issue where the profile for all branches that have a BRANCHENTRY is dropped. If the branch has an entry in BAT, it will be translated to its input offset. We used to only permit the basic block offset as a branch source. Perform a lookup of containing basic block instead. Test Plan: Updated bolt-address-translation-yaml.test Reviewers: maksfb, dcci, rafaelauler, ayermolo Reviewed By: maksfb Pull Request: https://github.com/llvm/llvm-project/pull/91273	2024-05-12 17:11:09 -07:00
Amir Ayupov	4f127667ca	[BOLT] Set entry counts in BAT YAML profile (#91775 ) Align with DataReader::readProfile that sets entry block counts from FuncBranchData->EntryData. Test Plan: updated bolt-address-translation-yaml.test	2024-05-10 22:23:45 -07:00
Amir Ayupov	bbcdd4f4b2	[BOLT] Use disambiguated local names in BAT YAML Align BAT YAML to fdata profile. Test Plan: updated register-fragments-bolt-symbols.s Reviewers: dcci, rafaelauler, ayermolo, maksfb Reviewed By: dcci Pull Request: https://github.com/llvm/llvm-project/pull/91773	2024-05-10 22:18:50 -07:00
Amir Ayupov	db29f20fdd	[BOLT] Ignore returns in DataAggregator Returns are ignored in perf/pre-aggregated/fdata profile reader (see DataReader::convertBranchData). They are also omitted in YAMLProfileWriter by virtue of not having the profile attached to them in the reader, and YAMLProfileWriter converting the profile attached to BinaryFunctions. Thus, return profile is universally ignored across all profile types except BAT YAML. To make returns ignored for YAML produced in BAT mode, we can: 1) ignore them in YAMLProfileReader, 2) omit them from YAML profile in profile conversion/writing. The first option is prone to profile staleness issue, where the profiled binary doesn't match the one to be optimized, and thus returns in the profile can no longer be reliably detected (as we don't distinguish them from calls in the profile). The second option is robust to staleness but requires disassembling the branch source instruction. Test Plan: Updated bolt-address-translation-yaml.test Reviewers: rafaelauler, dcci, ayermolo, maksfb Reviewed By: maksfb Pull Request: https://github.com/llvm/llvm-project/pull/90807	2024-05-08 12:02:18 -07:00
Amir Ayupov	f2d7130579	[BOLT][NFC] Simplify DataAggregator::getFallthroughsInTrace (#90752 )	2024-05-01 21:53:49 +02:00
Amir Ayupov	5fb59e7447	[BOLT] Print program stats in perf2bolt/aggregate-only mode (#89763 )	2024-04-25 19:08:51 +02:00
Amir Ayupov	3997f0eb81	[BOLT] Cover all call sites in writeBATYAML Call site information setting was conditioned on branch information presence for a given block. However, it's possible to have sampled profile lacking one or the other for a given basic block. Iterate over branch profiles and call profiles independently to cover all recorded profile data. Depends on https://github.com/llvm/llvm-project/pull/87569 Test Plan: Updated bolt/test/X86/yaml-secondary-entry-discriminator.s Reviewers: ayermolo, dcci, maksfb, rafaelauler Reviewed By: maksfb Pull Request: https://github.com/llvm/llvm-project/pull/87743	2024-04-11 21:15:04 +02:00
Amir Ayupov	8840992667	[BOLT][BAT] Fix handling of split functions Move BAT parent function lookup outside `getLocationName`, to the scope where we retrieve `FuncBranchData` linked with the function. Previously DataAggregator would store branch profile recorded in the split fragment in `FuncBranchData` associated with the fragment, and perform name translation in `getLocationName` for symbol name only. This works for fdata profile which is printed out as-is, but doesn't work with BAT YAML profile writer which requires a combined profile. The issue necessitated `fixupBATProfile` which partially addressed the issue (reassigned inter-fragment calls back into intra-function branches). However, `fixupBATProfile` fails to address disjoint profiles (i.e. doesn't merge `FuncBranchData` for fragments back into parent). This diff eliminates the need for `fixupBATProfile` by removing the root cause of the issue. Test Plan: NFC for existing tests Reviewers: ayermolo, dcci, rafaelauler, maksfb Reviewed By: maksfb Pull Request: https://github.com/llvm/llvm-project/pull/87569	2024-04-11 21:07:36 +02:00
Amir Ayupov	2d3c827c05	[BOLT] Use BAT for YAML profile call target information Provide a mechanism to resolve call target information for calls from non-BAT functions to BAT functions (`YAMLProfileWriter::convert`). Make it generic for future use in BAT-to-BAT calls. Test Plan: Updated bolt/test/X86/bolt-address-translation-yaml.test Reviewers: ayermolo, maksfb, rafaelauler, dcci Reviewed By: maksfb Pull Request: https://github.com/llvm/llvm-project/pull/86219	2024-04-05 16:08:59 -07:00
Amir Ayupov	213eda157a	[BOLT] Add CallSiteInfo entries in YAMLBAT (#76896 ) Attach call counters to YAML profile, covering inter-function control flow. Depends on: https://github.com/llvm/llvm-project/pull/86218 Test Plan: Updated bolt/test/X86/bolt-address-translation-yaml.test	2024-03-25 16:23:21 -07:00
Amir Ayupov	d7d2f7ca62	[BOLT] Emit intra-function control flow in YAMLBAT Attach branch counters to YAML profile, covering intra-function control flow. Depends on: https://github.com/llvm/llvm-project/pull/86353 Test Plan: Updated bolt/test/X86/bolt-address-translation-yaml.test Reviewers: rafaelauler, dcci, ayermolo, maksfb Reviewed By: rafaelauler Pull Request: https://github.com/llvm/llvm-project/pull/76911	2024-03-23 19:11:49 -07:00
Amir Ayupov	6280681137	[BOLT] Output basic YAML profile in BAT mode Relax assumptions that YAML output is not supported in BAT mode. Set up basic infrastructure for emitting YAML for functions not covered by BAT, such as from `.bolt.org.text` section (code identical to input binary sans external refs), or non-rewritten functions in non-relocation mode (where the function stays in the same section but BAT mapping is not emitted). This diff only produces YAML profile for non-BAT functions (skipped, non-simple). YAML profile for BAT functions is added in follow-up diffs: - https://github.com/llvm/llvm-project/pull/76911 emits YAML profile with internal control flow information only (branch profile), - https://github.com/llvm/llvm-project/pull/76896 adds cross-function profile (calls profile). Test Plan: Added bolt/test/X86/bolt-address-translation-yaml.test Reviewers: ayermolo, dcci, maksfb, rafaelauler Reviewed By: rafaelauler Pull Request: https://github.com/llvm/llvm-project/pull/76910	2024-03-21 14:32:13 -07:00
Maksim Panchenko	2abcbbd96a	[BOLT] Detect Linux kernel based on ELF program headers (#80086 ) Check if program header addresses fall into the kernel space to detect a Linux kernel binary on x86-64. Delete opts::LinuxKernelMode and use BinaryContext::IsLinuxKernel instead.	2024-01-30 18:04:29 -08:00
Kazu Hirata	ad8fd5b185	[BOLT] Use StringRef::{starts,ends}_with (NFC) This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with.	2023-12-13 23:34:49 -08:00
Jonathan Davies	22bea0c521	[BOLT] Add itrace aggregation for AUX data (#70426 ) If you have a perf.data with Arm ETM data the only way to use perf2bolt with Branch Aggregation is to first run `perf inject --itrace=l64i1us -o perf-brstack.data` and then pass the new perf-brstack.data into perf2bolt. perf2bolt then runs `perf script -F pid,ip,brstack` to produce the brstacks. This PR adds `--itrace` arg to perf2bolt to enable Itrace Aggregation. It takes a string which is what is passed to the `perf script -F pid,ip,brstack --itrace={0}`. This command produces the brstacks without having to run perf inject and creating a new perf.data file.	2023-11-06 12:40:04 +01:00
Jonathan Davies	5db75d74a1	[BOLT] Filter itrace from perf script mmap & task events (#69585 ) perf2bolt launches a few perf script commands and stores the output in temporary files before processing the output and cleaning them up before it exits. The command `perf script --show-mmap-events` outputs PERF_RECORD_MMAP2 and instruction tracing data but when processed it only looks for PERF_RECORD_MMAP2 and the instruction tracing data is ignored. This is fine for small amounts of instruction trace data but when I've recorded Arm ETM or Intel PT AUX I get lots of it By adding `--no-itrace` is will just show the PERF_RECORD_MMAP2 records and will save on time running the `perf script`, disk space storing the output & time parsing the output. It is the same for `perf script --show-task-events` where BOLT is only interested in the PERF_RECORD_COMM & PERF_RECORD_FORK records. ### Data \| Perf Record \| Perf Data Size \| MMap Size \| MMap No Itrace Size \| \|---\|---\|---\|---\| \| perf record -e cs_etm/@tmc_etr0/u \| 137K \| 4468K \| 0.632K \| \| perf record -e intel_pt//u \| 890K \| 33378K \| 0.673K \|	2023-10-20 10:00:05 +02:00
Amir Ayupov	4627446d38	[BOLT] Fix AutoFDO output format after D154120 AutoFDO profile has no leading 0x in hex dumps. Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D159507	2023-09-12 13:58:25 -07:00
Amir Ayupov	ffef4fe0db	[BOLT][NFC] Use formatv in DataAggregator/DataReader prints Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D154120	2023-09-11 16:01:02 -07:00
Amir Ayupov	d796f36fbc	[BOLT][NFC] Simplify DataAggregator Use short loop instead of duplicating the code for setHasProfileAvailable. Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D154749	2023-07-31 14:54:41 -07:00
Amir Ayupov	224e4cc516	[BOLT] Sort BranchData in DataAggregator Align perf reader to fdata behavior by sorting BranchData after reading samples, in the same way as DataReader: `20c66a0c66/bolt/lib/Profile/DataReader.cpp (L1239)` Namely, that order affects CallSiteInfo annotations which determine the construction order of CallGraph, which in turn affects function reordering. Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D152731	2023-06-15 12:08:57 -07:00
Amir Ayupov	5acac7db6e	[BOLT][NFCI] Use StringRef.split in launchPerfProcess Use StringRef method instead of reimplementing the splitting. Incidentally, it also fixes the duplicate printing of the command arguments: ``` PERF2BOLT: spawning perf job to read branch events Launching perf: /usr/bin/perf script^@-F^@pid,ip,brstack -F^@pid,ip,brstack pid,ip,brstack -f -i PERF2BOLT: spawning perf job to read mem events Launching perf: /usr/bin/perf script^@-F^@pid,event,addr,ip -F^@pid,event,addr,ip pid,event,addr,ip -f -i PERF2BOLT: spawning perf job to read process events Launching perf: /usr/bin/perf script^@--show-mmap-events --show-mmap-events -f -i PERF2BOLT: spawning perf job to read task events Launching perf: /usr/bin/perf script^@--show-task-events --show-task-events -f -i ``` Fixes it to: ``` PERF2BOLT: spawning perf job to read branch events Launching perf: /usr/bin/perf script -F pid,ip,brstack -f -i PERF2BOLT: spawning perf job to read mem events Launching perf: /usr/bin/perf script -F pid,event,addr,ip -f -i PERF2BOLT: spawning perf job to read process events Launching perf: /usr/bin/perf script --show-mmap-events -f -i PERF2BOLT: spawning perf job to read task events Launching perf: /usr/bin/perf script --show-task-events -f -i ``` Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D152483	2023-06-09 06:24:17 -07:00
Amir Ayupov	c061f75554	[BOLT] Handle recursive calls as inter-branches in DataAggregator Align yaml and fdata profiles by applying the same treatment to recursive calls (direct, indirect, tail). fdata profile increments entry count when handling recursive calls. Make perf/pre-aggregated perf reader (DataAggregator) do the same. Test Plan: In pre-aggregated-perf.test, add a dummy pre-aggregated branch entry between an indirect call in `frame_dummy` function and its entry point. Check that YAML profile gets incremented entry count for this function. End-to-end test: https://github.com/rafaelauler/bolt-tests/pull/24 Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D152338	2023-06-08 04:17:07 -07:00
Amir Ayupov	713b28532e	[BOLT][NFC] Fix debug messages Fix debug printing, making it easier to compare two debug logs side by side: - `BinaryFunction::addRelocation`: print function name instead of `this` ptr, - `DataAggregator::doTrace`: remove duplicated function name. Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D152314	2023-06-06 15:50:58 -07:00
Amir Ayupov	a478a09131	[BOLT][NFC] Drop MMap events for deleted files Don't parse/handle mmap events with "(deleted)" filename. Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D151948	2023-06-05 13:03:40 -07:00
Amir Ayupov	bce889c8df	[BOLT] Align BranchInfo and FuncBranchData in DataAggregator::recordTrace `DataAggregator::recordTrace` serves two purposes: - Attaching LBR fallthrough ("trace") information to CFG (`getBranchInfo`), which eventually gets emitted as YAML profile. - Populating vector of offsets that gets added to `FuncBranchData`, which eventually gets emitted as fdata profile. `recordTrace` is invoked from `getFallthroughsInTrace` which checks its return status and passes on the collected vector of offsets to `doTrace`. However, if a malformed trace is passed to `recordTrace` it might partially attach the profile to CFG and exit with false, not propagating the vector of offsets to `doTrace`. This leads to a difference between fdata and yaml profile collected from the same binary and the same perf file. (Skylake LBR errata might produce such malformed traces where the last entry is duplicated, resulting in invalid fallthrough path between the last two entries). There are two ways to handle this mismatch: conservative (aligned with fdata), or aggressive (aligned with yaml). Conservative approach would discard the trace entirely, buffering the CFG updates until all fallthroughs are confirmed. Aggressive approach would apply CFG updates and return the matching fallthroughs in the vector even if the trace is invalid (doesn't correspond to a valid fallthrough path). I chose to go with the former (conservative/fdata) approach which produces more accurate profile. We can't rely on pre-filtering such traces early (in LBR sample processing) as DataAggregator is used for both perf samples and pre-aggregated perf information which loses branch stack information. Test Plan: https://github.com/rafaelauler/bolt-tests/pull/22 Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D151614	2023-05-30 18:03:45 -07:00
Amir Ayupov	860543d96e	[BOLT][NFC] Extract DataAggregator::parseLBRSample Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D150986	2023-05-19 17:50:02 -07:00
Amir Ayupov	17f3cbe3af	[BOLT][NFC] Use llvm::make_range Use `llvm::make_range` convenience wrapper from ADT. Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D145887	2023-05-17 10:50:56 -07:00

1 2

93 Commits