We compute BF hashes in `YAMLProfileReader::readProfile` when first
matching profile functions with binary functions, and second time in
`YAMLProfileReader::parseFunctionProfile` during the profile assignment
(we need to do that to account for LTO private functions with
mismatching suffix).
Avoid recomputing the hash if it's been set.
YAML profile for non-simple functions without CFG is
1) useless for optimizations,
2) can't be attached, similar to fdata profile,
3) would be reported as invalid/stale even if the profile is valid.
Don't attempt to attach the profile in this case, aligning the behavior
to DataReader.
Test Plan: added yaml-non-simple.test
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
Provide backwards compatibility for YAML profile that uses `std::hash`:
xxh3 hash is the default for newly produced profile (sets `std-hash:
false`),
whereas the profile that doesn't specify `std-hash` will be treated as
`std-hash: true`, preserving old behavior.
D159460 regressed the bugfix in D156644. Fix that and emit a warning.
Add a test case.
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D159529
Reduce YAML profile processing times:
- preprocessProfile: speed up buildNameMaps by replacing ProfileNameToProfile
mapping with ProfileFunctionNames set and ProfileBFs vector.
Pre-look up YamlBF->BF correspondence, memoize in ProfileBFs.
- readProfile: replace iteration over all functions in the binary by iteration
over profile functions (strict match and LTO name match).
On a large binary (1.9M functions) and large YAML profile (121MB, 30k functions)
reduces profile steps runtime:
pre-process profile data: 12.4953s -> 10.7123s
process profile data: 9.8195s -> 5.6639s
Compared to fdata profile reading:
pre-process profile data: 8.0268s
process profile data: 1.0265s
process profile data pre-CFG: 0.1644s
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D159460
Work around the issue of multiple profiles per function.
Can happen with a stale profile which has separate profiles
that in a new binary got merged and became aliases.
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D156644
1. Using ADT/Bitfields.h for hash computation; this is equivalent but shorter than the existing implementation
2. Getting rid of Layout indices for stale matching; using BB->getIndex for indexing
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D155748
Specify blocks order used in YAML profile. Needed to ensure profile backwards
compatibility with pre-D155514 DFS order by default.
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D156176
Use layout order in YAML profile reading/writing. Preserve old behavior (DFS order)
under `-profile-use-dfs` option.
Reviewed By: spupyrev
Differential Revision: https://reviews.llvm.org/D155514
BOLT often has to deal with profiles collected on binaries built from several
revisions behind release. As a result, a certain percentage of functions is
considered stale and not optimized. This diff adds an ability to match profile
to functions that are not 100% binary identical, which increases the
optimization coverage and boosts the performance of applications.
The algorithm consists of two phases: matching and inference:
- At the matching phase, we try to "guess" as many block and jump counts from
the stale profile as possible. To this end, the content of each basic block
is hashed and stored in the (yaml) profile. When BOLT optimizes a binary,
it computes block hashes and identifies the corresponding entries in the
stale profile. It yields a partial profile for every CFG in the binary.
- At the inference phase, we employ a network flow-based algorithm (profi) to
reconstruct "realistic" block and jump counts from the partial profile
generated at the first stage. In practice, we don't always produce proper
profile data but the majority (e.g., >90%) of CFGs get the correct counts.
This is a first part of the change; the next stacked diff extends the block hashing
and provides perf evaluation numbers.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D144500
`Function.RawBranchCount` is initialized for fdata profile but not for yaml one.
The diff adds the computation of the field for yaml profiles
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D144211
Summary:
Refactor bolt/*/Profile to follow the braces rule for if/else/loop from
[LLVM Coding Standards](https://llvm.org/docs/CodingStandards.html).
(cherry picked from FBD33345741)
Summary:
Switched members of BinaryFunction to ADT where it was possible and
made sense. As a result, the size of BinaryFunction on x86-64 Linux
reduced from 1624 bytes to 1448.
(cherry picked from FBD32981555)
Summary:
Moves source files into separate components, and make explicit
component dependency on each other, so LLVM build system knows how to
build BOLT in BUILD_SHARED_LIBS=ON.
Please use the -c merge.renamelimit=230 git option when rebasing your
work on top of this change.
To achieve this, we create a new library to hold core IR files (most
classes beginning with Binary in their names), a new library to hold
Utils, some command line options shared across both RewriteInstance
and core IR files, a new library called Rewrite to hold most classes
concerned with running top-level functions coordinating the binary
rewriting process, and a new library called Profile to hold classes
dealing with profile reading and writing.
To remove the dependency from BinaryContext into X86-specific classes,
we do some refactoring on the BinaryContext constructor to receive a
reference to the specific backend directly from RewriteInstance. Then,
the dependency on X86 or AArch64-specific classes is transfered to the
Rewrite library. We can't have the Core library depend on targets
because targets depend on Core (which would create a cycle).
Files implementing the entry point of a tool are transferred to the
tools/ folder. All header files are transferred to the include/
folder. The src/ folder was renamed to lib/.
(cherry picked from FBD32746834)