Commit Graph

120 Commits

Author SHA1 Message Date
Amir Ayupov
b5af667b01 [BOLT] Map branch source address to the containing basic block in BAT YAML
Fix an issue where the profile for all branches that have a BRANCHENTRY
is dropped. If the branch has an entry in BAT, it will be translated to
its input offset. We used to only permit the basic block offset as a
branch source. Perform a lookup of containing basic block instead.

Test Plan: Updated bolt-address-translation-yaml.test

Reviewers: maksfb, dcci, rafaelauler, ayermolo

Reviewed By: maksfb

Pull Request: https://github.com/llvm/llvm-project/pull/91273
2024-05-12 17:11:09 -07:00
Amir Ayupov
4f127667ca [BOLT] Set entry counts in BAT YAML profile (#91775)
Align with DataReader::readProfile that sets entry block counts from
FuncBranchData->EntryData.

Test Plan: updated bolt-address-translation-yaml.test
2024-05-10 22:23:45 -07:00
Amir Ayupov
bbcdd4f4b2 [BOLT] Use disambiguated local names in BAT YAML
Align BAT YAML to fdata profile.

Test Plan: updated register-fragments-bolt-symbols.s

Reviewers: dcci, rafaelauler, ayermolo, maksfb

Reviewed By: dcci

Pull Request: https://github.com/llvm/llvm-project/pull/91773
2024-05-10 22:18:50 -07:00
Amir Ayupov
db29f20fdd [BOLT] Ignore returns in DataAggregator
Returns are ignored in perf/pre-aggregated/fdata profile reader (see
DataReader::convertBranchData). They are also omitted in
YAMLProfileWriter by virtue of not having the profile attached to them
in the reader, and YAMLProfileWriter converting the profile attached to
BinaryFunctions. Thus, return profile is universally ignored across all
profile types except BAT YAML.

To make returns ignored for YAML produced in BAT mode, we can:
1) ignore them in YAMLProfileReader,
2) omit them from YAML profile in profile conversion/writing.

The first option is prone to profile staleness issue, where the profiled
binary doesn't match the one to be optimized, and thus returns in the
profile can no longer be reliably detected (as we don't distinguish them
from calls in the profile).

The second option is robust to staleness but requires disassembling the
branch source instruction.

Test Plan: Updated bolt-address-translation-yaml.test

Reviewers: rafaelauler, dcci, ayermolo, maksfb

Reviewed By: maksfb

Pull Request: https://github.com/llvm/llvm-project/pull/90807
2024-05-08 12:02:18 -07:00
Amir Ayupov
f2d7130579 [BOLT][NFC] Simplify DataAggregator::getFallthroughsInTrace (#90752) 2024-05-01 21:53:49 +02:00
Amir Ayupov
5fb59e7447 [BOLT] Print program stats in perf2bolt/aggregate-only mode (#89763) 2024-04-25 19:08:51 +02:00
Amir Ayupov
3997f0eb81 [BOLT] Cover all call sites in writeBATYAML
Call site information setting was conditioned on branch information
presence for a given block. However, it's possible to have sampled
profile lacking one or the other for a given basic block.

Iterate over branch profiles and call profiles independently to cover
all recorded profile data.

Depends on https://github.com/llvm/llvm-project/pull/87569

Test Plan: Updated bolt/test/X86/yaml-secondary-entry-discriminator.s

Reviewers: ayermolo, dcci, maksfb, rafaelauler

Reviewed By: maksfb

Pull Request: https://github.com/llvm/llvm-project/pull/87743
2024-04-11 21:15:04 +02:00
Amir Ayupov
8840992667 [BOLT][BAT] Fix handling of split functions
Move BAT parent function lookup outside `getLocationName`, to the
scope where we retrieve `FuncBranchData` linked with the function.

Previously DataAggregator would store branch profile recorded in the
split fragment in `FuncBranchData` associated with the fragment, and
perform name translation in `getLocationName` for symbol name only.
This works for fdata profile which is printed out as-is, but doesn't
work with BAT YAML profile writer which requires a combined profile.

The issue necessitated `fixupBATProfile` which partially addressed the
issue (reassigned inter-fragment calls back into intra-function
branches). However, `fixupBATProfile` fails to address disjoint
profiles (i.e. doesn't merge `FuncBranchData` for fragments back
into parent). This diff eliminates the need for `fixupBATProfile` by
removing the root cause of the issue.

Test Plan: NFC for existing tests

Reviewers: ayermolo, dcci, rafaelauler, maksfb

Reviewed By: maksfb

Pull Request: https://github.com/llvm/llvm-project/pull/87569
2024-04-11 21:07:36 +02:00
Amir Ayupov
2d3c827c05 [BOLT] Use BAT for YAML profile call target information
Provide a mechanism to resolve call target information for calls from non-BAT
functions to BAT functions (`YAMLProfileWriter::convert`). Make it generic for
future use in BAT-to-BAT calls.

Test Plan: Updated bolt/test/X86/bolt-address-translation-yaml.test

Reviewers: ayermolo, maksfb, rafaelauler, dcci

Reviewed By: maksfb

Pull Request: https://github.com/llvm/llvm-project/pull/86219
2024-04-05 16:08:59 -07:00
Amir Ayupov
213eda157a [BOLT] Add CallSiteInfo entries in YAMLBAT (#76896)
Attach call counters to YAML profile, covering inter-function control
flow.

Depends on: https://github.com/llvm/llvm-project/pull/86218

Test Plan: 
Updated bolt/test/X86/bolt-address-translation-yaml.test
2024-03-25 16:23:21 -07:00
Amir Ayupov
d7d2f7ca62 [BOLT] Emit intra-function control flow in YAMLBAT
Attach branch counters to YAML profile, covering intra-function control
flow.

Depends on: https://github.com/llvm/llvm-project/pull/86353

Test Plan: Updated bolt/test/X86/bolt-address-translation-yaml.test

Reviewers: rafaelauler, dcci, ayermolo, maksfb

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/76911
2024-03-23 19:11:49 -07:00
Amir Ayupov
6280681137 [BOLT] Output basic YAML profile in BAT mode
Relax assumptions that YAML output is not supported in BAT mode.
Set up basic infrastructure for emitting YAML for functions not covered
by BAT, such as from `.bolt.org.text` section (code identical to input binary
sans external refs), or non-rewritten functions in non-relocation mode (where
the function stays in the same section but BAT mapping is not emitted).

This diff only produces YAML profile for non-BAT functions (skipped,
non-simple). YAML profile for BAT functions is added in follow-up diffs:
- https://github.com/llvm/llvm-project/pull/76911 emits YAML profile with
  internal control flow information only (branch profile),
- https://github.com/llvm/llvm-project/pull/76896 adds cross-function profile
  (calls profile).

Test Plan: Added bolt/test/X86/bolt-address-translation-yaml.test

Reviewers: ayermolo, dcci, maksfb, rafaelauler

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/76910
2024-03-21 14:32:13 -07:00
Maksim Panchenko
2abcbbd96a [BOLT] Detect Linux kernel based on ELF program headers (#80086)
Check if program header addresses fall into the kernel space to detect a
Linux kernel binary on x86-64.

Delete opts::LinuxKernelMode and use BinaryContext::IsLinuxKernel
instead.
2024-01-30 18:04:29 -08:00
Kazu Hirata
ad8fd5b185 [BOLT] Use StringRef::{starts,ends}_with (NFC)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
2023-12-13 23:34:49 -08:00
Jonathan Davies
22bea0c521 [BOLT] Add itrace aggregation for AUX data (#70426)
If you have a perf.data with Arm ETM data the only way to use perf2bolt
with Branch Aggregation is to first run `perf inject --itrace=l64i1us -o
perf-brstack.data` and then pass the new perf-brstack.data into
perf2bolt. perf2bolt then runs `perf script -F pid,ip,brstack` to
produce the brstacks.

This PR adds `--itrace` arg to perf2bolt to enable Itrace Aggregation.
It takes a string which is what is passed to the `perf script -F
pid,ip,brstack --itrace={0}`. This command produces the brstacks without
having to run perf inject and creating a new perf.data file.
2023-11-06 12:40:04 +01:00
Jonathan Davies
5db75d74a1 [BOLT] Filter itrace from perf script mmap & task events (#69585)
perf2bolt launches a few perf script commands and stores the output in
temporary files before processing the output and cleaning them up before
it exits.

The command `perf script --show-mmap-events` outputs PERF_RECORD_MMAP2
and instruction tracing data but when processed it only looks for
PERF_RECORD_MMAP2 and the instruction tracing data is ignored. This is
fine for small amounts of instruction trace data but when I've recorded
Arm ETM or Intel PT AUX I get lots of it

By adding `--no-itrace` is will just show the PERF_RECORD_MMAP2 records
and will save on time running the `perf script`, disk space storing the
output & time parsing the output.

It is the same for `perf script --show-task-events` where BOLT is only
interested in the PERF_RECORD_COMM & PERF_RECORD_FORK records.

### Data

| Perf Record | Perf Data Size  | MMap Size | MMap No Itrace Size |
|---|---|---|---|
| perf record -e cs_etm/@tmc_etr0/u | 137K | 4468K | 0.632K |
| perf record -e intel_pt//u | 890K | 33378K | 0.673K |
2023-10-20 10:00:05 +02:00
Amir Ayupov
4627446d38 [BOLT] Fix AutoFDO output format after D154120
AutoFDO profile has no leading 0x in hex dumps.

Reviewed By: #bolt, rafauler

Differential Revision: https://reviews.llvm.org/D159507
2023-09-12 13:58:25 -07:00
Amir Ayupov
ffef4fe0db [BOLT][NFC] Use formatv in DataAggregator/DataReader prints
Reviewed By: #bolt, maksfb

Differential Revision: https://reviews.llvm.org/D154120
2023-09-11 16:01:02 -07:00
Amir Ayupov
d796f36fbc [BOLT][NFC] Simplify DataAggregator
Use short loop instead of duplicating the code for setHasProfileAvailable.

Reviewed By: #bolt, maksfb

Differential Revision: https://reviews.llvm.org/D154749
2023-07-31 14:54:41 -07:00
Amir Ayupov
224e4cc516 [BOLT] Sort BranchData in DataAggregator
Align perf reader to fdata behavior by sorting BranchData after reading samples,
in the same way as DataReader:
20c66a0c66/bolt/lib/Profile/DataReader.cpp (L1239)

Namely, that order affects CallSiteInfo annotations which determine the
construction order of CallGraph, which in turn affects function reordering.

Reviewed By: #bolt, rafauler

Differential Revision: https://reviews.llvm.org/D152731
2023-06-15 12:08:57 -07:00
Amir Ayupov
5acac7db6e [BOLT][NFCI] Use StringRef.split in launchPerfProcess
Use StringRef method instead of reimplementing the splitting.
Incidentally, it also fixes the duplicate printing of the command arguments:
```
PERF2BOLT: spawning perf job to read branch events
Launching perf: /usr/bin/perf script^@-F^@pid,ip,brstack -F^@pid,ip,brstack pid,ip,brstack -f -i
PERF2BOLT: spawning perf job to read mem events
Launching perf: /usr/bin/perf script^@-F^@pid,event,addr,ip -F^@pid,event,addr,ip pid,event,addr,ip -f -i
PERF2BOLT: spawning perf job to read process events
Launching perf: /usr/bin/perf script^@--show-mmap-events --show-mmap-events -f -i
PERF2BOLT: spawning perf job to read task events
Launching perf: /usr/bin/perf script^@--show-task-events --show-task-events -f -i
```

Fixes it to:
```
PERF2BOLT: spawning perf job to read branch events
Launching perf: /usr/bin/perf script -F pid,ip,brstack -f -i
PERF2BOLT: spawning perf job to read mem events
Launching perf: /usr/bin/perf script -F pid,event,addr,ip -f -i
PERF2BOLT: spawning perf job to read process events
Launching perf: /usr/bin/perf script --show-mmap-events -f -i
PERF2BOLT: spawning perf job to read task events
Launching perf: /usr/bin/perf script --show-task-events -f -i
```

Reviewed By: #bolt, rafauler

Differential Revision: https://reviews.llvm.org/D152483
2023-06-09 06:24:17 -07:00
Amir Ayupov
c061f75554 [BOLT] Handle recursive calls as inter-branches in DataAggregator
Align yaml and fdata profiles by applying the same treatment to recursive
calls (direct, indirect, tail). fdata profile increments entry count when
handling recursive calls. Make perf/pre-aggregated perf reader (DataAggregator)
do the same.

Test Plan:
In pre-aggregated-perf.test, add a dummy pre-aggregated branch entry between
an indirect call in `frame_dummy` function and its entry point.
Check that YAML profile gets incremented entry count for this function.

End-to-end test: https://github.com/rafaelauler/bolt-tests/pull/24

Reviewed By: #bolt, maksfb

Differential Revision: https://reviews.llvm.org/D152338
2023-06-08 04:17:07 -07:00
Amir Ayupov
713b28532e [BOLT][NFC] Fix debug messages
Fix debug printing, making it easier to compare two debug logs side by side:
- `BinaryFunction::addRelocation`: print function name instead of `this` ptr,
- `DataAggregator::doTrace`: remove duplicated function name.

Reviewed By: #bolt, maksfb

Differential Revision: https://reviews.llvm.org/D152314
2023-06-06 15:50:58 -07:00
Amir Ayupov
a478a09131 [BOLT][NFC] Drop MMap events for deleted files
Don't parse/handle mmap events with "(deleted)" filename.

Reviewed By: #bolt, rafauler

Differential Revision: https://reviews.llvm.org/D151948
2023-06-05 13:03:40 -07:00
Amir Ayupov
bce889c8df [BOLT] Align BranchInfo and FuncBranchData in DataAggregator::recordTrace
`DataAggregator::recordTrace` serves two purposes:
  - Attaching LBR fallthrough ("trace") information to CFG (`getBranchInfo`),
    which eventually gets emitted as YAML profile.
  - Populating vector of offsets that gets added to `FuncBranchData`, which
    eventually gets emitted as fdata profile.

`recordTrace` is invoked from `getFallthroughsInTrace` which checks its return
status and passes on the collected vector of offsets to `doTrace`.

However, if a malformed trace is passed to `recordTrace` it might partially
attach the profile to CFG and exit with false, not propagating the vector of
offsets to `doTrace`. This leads to a difference between fdata and yaml profile
collected from the same binary and the same perf file.

(Skylake LBR errata might produce such malformed traces where the last entry
is duplicated, resulting in invalid fallthrough path between the last two
entries).

There are two ways to handle this mismatch: conservative (aligned with fdata),
or aggressive (aligned with yaml). Conservative approach would discard the
trace entirely, buffering the CFG updates until all fallthroughs are confirmed.
Aggressive approach would apply CFG updates and return the matching
fallthroughs in the vector even if the trace is invalid (doesn't correspond to
a valid fallthrough path). I chose to go with the former (conservative/fdata)
approach which produces more accurate profile.

We can't rely on pre-filtering such traces early (in LBR sample processing) as
DataAggregator is used for both perf samples and pre-aggregated perf information
which loses branch stack information.

Test Plan: https://github.com/rafaelauler/bolt-tests/pull/22

Reviewed By: #bolt, rafauler

Differential Revision: https://reviews.llvm.org/D151614
2023-05-30 18:03:45 -07:00
Amir Ayupov
860543d96e [BOLT][NFC] Extract DataAggregator::parseLBRSample
Reviewed By: #bolt, rafauler

Differential Revision: https://reviews.llvm.org/D150986
2023-05-19 17:50:02 -07:00
Amir Ayupov
17f3cbe3af [BOLT][NFC] Use llvm::make_range
Use `llvm::make_range` convenience wrapper from ADT.

Reviewed By: #bolt, rafauler

Differential Revision: https://reviews.llvm.org/D145887
2023-05-17 10:50:56 -07:00
Amir Ayupov
c7af4f383d [BOLT][NFC] Simplify preprocessProfile
Move out prepareToParse lambda, generalize it to handle mem events perf process.

Reviewed By: #bolt, rafauler

Differential Revision: https://reviews.llvm.org/D146002
2023-03-15 12:56:06 -07:00
Maksim Panchenko
73b89e3f38 [BOLT] Remove dependency on StringMap iteration order
Remove the usage of StringMap in places where the iteration order
affects the output since the iteration over StringMap is
non-deterministic.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D145194
2023-03-03 09:21:26 -08:00
Amir Ayupov
4a7966ea1b [BOLT][NFC] DataAggregator code cleanup
Reviewed By: #bolt, rafauler

Differential Revision: https://reviews.llvm.org/D139794
2023-01-18 13:44:44 -08:00
Joe Loser
a288d7f937 [llvm][ADT] Replace uses of makeMutableArrayRef with deduction guides
Similar to how `makeArrayRef` is deprecated in favor of deduction guides, do the
same for `makeMutableArrayRef`.

Once all of the places in-tree are using the deduction guides for
`MutableArrayRef`, we can mark `makeMutableArrayRef` as deprecated.

Differential Revision: https://reviews.llvm.org/D141814
2023-01-16 14:49:37 -07:00
Amir Ayupov
6b05a62a6b [BOLT] Check no-LBR samples in mayHaveProfileData
No-LBR mode wasn't tested and slipped when mayHaveProfileData was added for
Lite mode. This enables processing of profiles collected without LBR and
converted with `perf2bolt -nl` option.

Test Plan:
bin/llvm-lit -a tools/bolt/test/X86/nolbr.s
https://github.com/rafaelauler/bolt-tests/pull/20

Reviewed By: #bolt, rafauler

Differential Revision: https://reviews.llvm.org/D140256
2023-01-03 14:43:36 -08:00
Matt Arsenault
765f3cafa1 bolt: Update more sys::Wait calls 2022-12-14 12:00:41 -05:00
Matt Arsenault
6be2db6ca5 bolt: Try to fix build after sys::Program API change
Hopefully fixes build after 15a6e3c636
2022-12-14 11:56:13 -05:00
Amir Ayupov
e8f5743e86 [BOLT][NFC] Use std::optional in BC 2022-12-11 22:13:46 -08:00
Amir Ayupov
835a9c2801 [BOLT][NFC] Use std::optional in DataAggregator 2022-12-11 22:13:46 -08:00
Amir Ayupov
3d573fdbb4 [BOLT][NFC] Use std::optional in BAT 2022-12-11 22:13:46 -08:00
Maksim Panchenko
0f915826cc [BOLT] Handle access errors while reading profile
When the user does not have permissions to access the profile, consume
the error contained in Expected<> to avoid dumping stack to the user.

Differential Revision: https://reviews.llvm.org/D139480
2022-12-07 17:11:30 -08:00
Krzysztof Parzyszek
3c255f679c Process: convert Optional to std::optional
This applies to GetEnv and FindInEnvPath.
2022-12-06 09:56:14 -08:00
Kazu Hirata
e324a80fab [BOLT] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 23:12:38 -08:00
Kazu Hirata
1028b165ee [BOLT] Fix a build error
This patch fixes:

  bolt/lib/Profile/DataAggregator.cpp:264:66: error: no viable
  conversion from 'Optional<llvm::StringRef>[3]' to
  'ArrayRef<std::optional<StringRef>>'
2022-12-01 15:48:03 -08:00
Kazu Hirata
34bcadc38c Use std::nullopt_t instead of NoneType (NFC)
This patch replaces those occurrences of NoneType that would trigger
an error if the definition of NoneType were missing in None.h.

To keep this patch focused, I am deliberately not replacing None with
std::nullopt in this patch or updating comments.  They will be
addressed in subsequent patches.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716

Differential Revision: https://reviews.llvm.org/D138539
2022-11-23 14:16:04 -08:00
Kazu Hirata
1fa870b1bd Use None consistently (NFC)
This patch replaces NoneType() and NoneType::None with None in
preparation for migration from llvm::Optional to std::optional.

In the std::optional world, we are not guranteed to be able to
default-construct std::nullopt_t or peek what's inside it, so neither
NoneType() nor NoneType::None has a corresponding expression in the
std::optional world.

Once we consistently use None, we should even be able to replace the
contents of llvm/include/llvm/ADT/None.h with something like:

  using NoneType = std::nullopt_t;
  inline constexpr std::nullopt_t None = std::nullopt;

to ease the migration from llvm::Optional to std::optional.

Differential Revision: https://reviews.llvm.org/D138376
2022-11-20 00:24:40 -08:00
Rafael Auler
ba9cc6537c [PERF2BOLT] Fix unittest failure
Fix failure caused by commit e549ac072b "Do not issue parsing error on
weird build ids".
2022-09-28 16:01:57 -07:00
Rafael Auler
e549ac072b [PERF2BOLT] Do not issue parsing error on weird build ids
In weird entries we were issueing a parse error. For example, in line 5 here:

6862acc063b0aa86595f52ff81628577df4296ff a.so
6862acc063b0aa86595f52ff81628577df4296ff a.so
6862acc063b0aa86595f52ff81628577df4296ff a.so
db758cb3c970044e78d5a4c99b011708a9995636 bin1
60326683eab31acfd03435d9ed4ff9a8         bin2
7d448e51851b4bdb33eac84f90e74628a14a5f00 b.so
742aa26e0211794356cc25f415c25230a26aa045 c.so

Error reading BOLT data input file: line 89, column 33: malformed field

Fix that.

Reviewed By: #bolt, Amir

Differential Revision: https://reviews.llvm.org/D134822
2022-09-28 14:41:55 -07:00
Amir Ayupov
39336fc09c [BOLT] Control aggregation mode output profile file format
In perf2bolt and `-aggregate-only` BOLT mode, the output profile file is written
in fdata format by default. Provide a knob `-profile-format=[fdata,yaml]` to
control the format.
Note that `-w` option still dumps in YAML format.

Reviewed By: #bolt, maksfb

Differential Revision: https://reviews.llvm.org/D133995
2022-09-19 13:37:10 -07:00
Kazu Hirata
20f0f15a40 Use StringRef::contains (NFC) 2022-08-28 23:29:02 -07:00
Amir Ayupov
f119a2483d [BOLT][NFC] Use llvm::any_of
Replace the imperative pattern of the following kind
```
bool IsTrue = false;
for (Element : Range) {
  if (Condition(Element)) {
    IsTrue = true;
    break;
  }
}
```
with functional style `llvm::any_of`:
```
bool IsTrue = llvm::any_of(Range, [&](Element) {
  return Condition(Element);
});
```

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D132276
2022-08-27 21:36:15 -07:00
Fabian Parzefall
d5c03def24 [BOLT] Towards FunctionLayout const-correctness
A const-qualified reference to function layout allows accessing
non-const qualified basic blocks on a const-qualified function. This
patch adds or removes const-qualifiers where necessary to indicate where
basic blocks are used in a non-const manner.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D132049
2022-08-24 16:32:33 -07:00
Fabian Parzefall
f24c299e7d Revert "[BOLT] Towards FunctionLayout const-correctness"
This reverts commit 587d265342.
2022-08-24 10:51:38 -07:00