Provide a mechanism to resolve call target information for calls from non-BAT
functions to BAT functions (`YAMLProfileWriter::convert`). Make it generic for
future use in BAT-to-BAT calls.
Test Plan: Updated bolt/test/X86/bolt-address-translation-yaml.test
Reviewers: ayermolo, maksfb, rafaelauler, dcci
Reviewed By: maksfb
Pull Request: https://github.com/llvm/llvm-project/pull/86219
Attach call counters to YAML profile, covering inter-function control
flow.
Depends on: https://github.com/llvm/llvm-project/pull/86218
Test Plan:
Updated bolt/test/X86/bolt-address-translation-yaml.test
Relax assumptions that YAML output is not supported in BAT mode.
Set up basic infrastructure for emitting YAML for functions not covered
by BAT, such as from `.bolt.org.text` section (code identical to input binary
sans external refs), or non-rewritten functions in non-relocation mode (where
the function stays in the same section but BAT mapping is not emitted).
This diff only produces YAML profile for non-BAT functions (skipped,
non-simple). YAML profile for BAT functions is added in follow-up diffs:
- https://github.com/llvm/llvm-project/pull/76911 emits YAML profile with
internal control flow information only (branch profile),
- https://github.com/llvm/llvm-project/pull/76896 adds cross-function profile
(calls profile).
Test Plan: Added bolt/test/X86/bolt-address-translation-yaml.test
Reviewers: ayermolo, dcci, maksfb, rafaelauler
Reviewed By: rafaelauler
Pull Request: https://github.com/llvm/llvm-project/pull/76910
Check if program header addresses fall into the kernel space to detect a
Linux kernel binary on x86-64.
Delete opts::LinuxKernelMode and use BinaryContext::IsLinuxKernel
instead.
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
If you have a perf.data with Arm ETM data the only way to use perf2bolt
with Branch Aggregation is to first run `perf inject --itrace=l64i1us -o
perf-brstack.data` and then pass the new perf-brstack.data into
perf2bolt. perf2bolt then runs `perf script -F pid,ip,brstack` to
produce the brstacks.
This PR adds `--itrace` arg to perf2bolt to enable Itrace Aggregation.
It takes a string which is what is passed to the `perf script -F
pid,ip,brstack --itrace={0}`. This command produces the brstacks without
having to run perf inject and creating a new perf.data file.
perf2bolt launches a few perf script commands and stores the output in
temporary files before processing the output and cleaning them up before
it exits.
The command `perf script --show-mmap-events` outputs PERF_RECORD_MMAP2
and instruction tracing data but when processed it only looks for
PERF_RECORD_MMAP2 and the instruction tracing data is ignored. This is
fine for small amounts of instruction trace data but when I've recorded
Arm ETM or Intel PT AUX I get lots of it
By adding `--no-itrace` is will just show the PERF_RECORD_MMAP2 records
and will save on time running the `perf script`, disk space storing the
output & time parsing the output.
It is the same for `perf script --show-task-events` where BOLT is only
interested in the PERF_RECORD_COMM & PERF_RECORD_FORK records.
### Data
| Perf Record | Perf Data Size | MMap Size | MMap No Itrace Size |
|---|---|---|---|
| perf record -e cs_etm/@tmc_etr0/u | 137K | 4468K | 0.632K |
| perf record -e intel_pt//u | 890K | 33378K | 0.673K |
Use short loop instead of duplicating the code for setHasProfileAvailable.
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D154749
Align perf reader to fdata behavior by sorting BranchData after reading samples,
in the same way as DataReader:
20c66a0c66/bolt/lib/Profile/DataReader.cpp (L1239)
Namely, that order affects CallSiteInfo annotations which determine the
construction order of CallGraph, which in turn affects function reordering.
Reviewed By: #bolt, rafauler
Differential Revision: https://reviews.llvm.org/D152731
Align yaml and fdata profiles by applying the same treatment to recursive
calls (direct, indirect, tail). fdata profile increments entry count when
handling recursive calls. Make perf/pre-aggregated perf reader (DataAggregator)
do the same.
Test Plan:
In pre-aggregated-perf.test, add a dummy pre-aggregated branch entry between
an indirect call in `frame_dummy` function and its entry point.
Check that YAML profile gets incremented entry count for this function.
End-to-end test: https://github.com/rafaelauler/bolt-tests/pull/24
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D152338
Fix debug printing, making it easier to compare two debug logs side by side:
- `BinaryFunction::addRelocation`: print function name instead of `this` ptr,
- `DataAggregator::doTrace`: remove duplicated function name.
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D152314
`DataAggregator::recordTrace` serves two purposes:
- Attaching LBR fallthrough ("trace") information to CFG (`getBranchInfo`),
which eventually gets emitted as YAML profile.
- Populating vector of offsets that gets added to `FuncBranchData`, which
eventually gets emitted as fdata profile.
`recordTrace` is invoked from `getFallthroughsInTrace` which checks its return
status and passes on the collected vector of offsets to `doTrace`.
However, if a malformed trace is passed to `recordTrace` it might partially
attach the profile to CFG and exit with false, not propagating the vector of
offsets to `doTrace`. This leads to a difference between fdata and yaml profile
collected from the same binary and the same perf file.
(Skylake LBR errata might produce such malformed traces where the last entry
is duplicated, resulting in invalid fallthrough path between the last two
entries).
There are two ways to handle this mismatch: conservative (aligned with fdata),
or aggressive (aligned with yaml). Conservative approach would discard the
trace entirely, buffering the CFG updates until all fallthroughs are confirmed.
Aggressive approach would apply CFG updates and return the matching
fallthroughs in the vector even if the trace is invalid (doesn't correspond to
a valid fallthrough path). I chose to go with the former (conservative/fdata)
approach which produces more accurate profile.
We can't rely on pre-filtering such traces early (in LBR sample processing) as
DataAggregator is used for both perf samples and pre-aggregated perf information
which loses branch stack information.
Test Plan: https://github.com/rafaelauler/bolt-tests/pull/22
Reviewed By: #bolt, rafauler
Differential Revision: https://reviews.llvm.org/D151614
Move out prepareToParse lambda, generalize it to handle mem events perf process.
Reviewed By: #bolt, rafauler
Differential Revision: https://reviews.llvm.org/D146002
Remove the usage of StringMap in places where the iteration order
affects the output since the iteration over StringMap is
non-deterministic.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D145194
Similar to how `makeArrayRef` is deprecated in favor of deduction guides, do the
same for `makeMutableArrayRef`.
Once all of the places in-tree are using the deduction guides for
`MutableArrayRef`, we can mark `makeMutableArrayRef` as deprecated.
Differential Revision: https://reviews.llvm.org/D141814
No-LBR mode wasn't tested and slipped when mayHaveProfileData was added for
Lite mode. This enables processing of profiles collected without LBR and
converted with `perf2bolt -nl` option.
Test Plan:
bin/llvm-lit -a tools/bolt/test/X86/nolbr.s
https://github.com/rafaelauler/bolt-tests/pull/20
Reviewed By: #bolt, rafauler
Differential Revision: https://reviews.llvm.org/D140256
When the user does not have permissions to access the profile, consume
the error contained in Expected<> to avoid dumping stack to the user.
Differential Revision: https://reviews.llvm.org/D139480
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
This patch fixes:
bolt/lib/Profile/DataAggregator.cpp:264:66: error: no viable
conversion from 'Optional<llvm::StringRef>[3]' to
'ArrayRef<std::optional<StringRef>>'
This patch replaces those occurrences of NoneType that would trigger
an error if the definition of NoneType were missing in None.h.
To keep this patch focused, I am deliberately not replacing None with
std::nullopt in this patch or updating comments. They will be
addressed in subsequent patches.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Differential Revision: https://reviews.llvm.org/D138539
This patch replaces NoneType() and NoneType::None with None in
preparation for migration from llvm::Optional to std::optional.
In the std::optional world, we are not guranteed to be able to
default-construct std::nullopt_t or peek what's inside it, so neither
NoneType() nor NoneType::None has a corresponding expression in the
std::optional world.
Once we consistently use None, we should even be able to replace the
contents of llvm/include/llvm/ADT/None.h with something like:
using NoneType = std::nullopt_t;
inline constexpr std::nullopt_t None = std::nullopt;
to ease the migration from llvm::Optional to std::optional.
Differential Revision: https://reviews.llvm.org/D138376
In weird entries we were issueing a parse error. For example, in line 5 here:
6862acc063b0aa86595f52ff81628577df4296ff a.so
6862acc063b0aa86595f52ff81628577df4296ff a.so
6862acc063b0aa86595f52ff81628577df4296ff a.so
db758cb3c970044e78d5a4c99b011708a9995636 bin1
60326683eab31acfd03435d9ed4ff9a8 bin2
7d448e51851b4bdb33eac84f90e74628a14a5f00 b.so
742aa26e0211794356cc25f415c25230a26aa045 c.so
Error reading BOLT data input file: line 89, column 33: malformed field
Fix that.
Reviewed By: #bolt, Amir
Differential Revision: https://reviews.llvm.org/D134822
In perf2bolt and `-aggregate-only` BOLT mode, the output profile file is written
in fdata format by default. Provide a knob `-profile-format=[fdata,yaml]` to
control the format.
Note that `-w` option still dumps in YAML format.
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D133995
A const-qualified reference to function layout allows accessing
non-const qualified basic blocks on a const-qualified function. This
patch adds or removes const-qualifiers where necessary to indicate where
basic blocks are used in a non-const manner.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D132049
A const-qualified reference to function layout allows accessing
non-const qualified basic blocks on a const-qualified function. This
patch adds or removes const-qualifiers where necessary to indicate where
basic blocks are used in a non-const manner.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D132049
This patch refactors BAT to be testable as a library, so we
can have open-source tests on it. This further fixes an issue with
basic blocks that lack a valid input offset, making BAT omit those
when writing translation tables.
Test Plan: new testcases added, new testing tool added (llvm-bat-dump)
Differential Revision: https://reviews.llvm.org/D129382
The latest perf tool can return non-empty buffer when executing
buildid-list command, even when perf.data was recorded with -B flag.
Some binaries will be listed without the ID, while others may have a
recorded ID. Allow invalid entires on the input, while checking the
valid ones for the match.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D130223
This patch adds a dedicated class to keep track of each function's
layout. It also lays the groundwork for splitting functions into
multiple fragments (as opposed to a strict hot/cold split).
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D129518
This patch adds a new feature to bolt heatmap to print the hotness of each section in terms of the percentage of samples within that section.
Sample output generated for the clang binary:
Section Name, Begin Address, End Address, Percentage Hotness
.text, 0x1a7b9b0, 0x20a2cc0, 1.4709
.init, 0x20a2cc0, 0x20a2ce1, 0.0001
.fini, 0x20a2ce4, 0x20a2cf2, 0.0000
.text.unlikely, 0x20a2d00, 0x431990c, 0.3061
.text.hot, 0x4319910, 0x4bc6927, 97.2197
.text.startup, 0x4bc6930, 0x4c10c89, 0.0058
.plt, 0x4c10c90, 0x4c12010, 0.9974
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D124412