Memoize SymbolRef::getAddress() for sorting symbol table entries by
their address. Saves about 10 seconds of processing time on large
binaries with over 2 million symbols. NFCI.
Reviewed By: jobnoorman, Amir
Differential Revision: https://reviews.llvm.org/D159524
If the R_AARCH64_CALL26 against a symbol that has a lower address, then
encodeValueAArch64 will return a wrong value.
Reviewed By: Kepontry, yota9
Differential Revision: https://reviews.llvm.org/D159513
When current state is `CFG_Finalized`, function `validateCFG()` should return true directly.
Reviewed By: maksfb, yota9, Kepontry
Differential Revision: https://reviews.llvm.org/D159410
Calls on RISC-V are typically compiled to `auipc`/`jalr` pairs to allow
a maximum target range (32-bit pc-relative). In order to optimize calls
to near targets, linker relaxation may replace those pairs with, for
example, single `jal` instructions.
To allow BOLT to freely reassign function addresses in relaxed binaries,
this patch proposes the following approach:
- Expand all relaxed calls back to `auipc`/`jalr`;
- Rely on JITLink to relax those back to shorter forms where possible.
This is implemented by detecting all possible call instructions and
replacing them with `PseudoCALL` (or `PseudoTAIL`) instructions. The
RISC-V backend then expands those and adds the necessary relocations for
relaxation.
Since BOLT generally ignores pseudo instruction, this patch makes
`MCPlusBuilder::isPseudo` virtual so that `RISCVMCPlusBuilder` can
override it to exclude `PseudoCALL` and `PseudoTAIL`.
To ensure JITLink knows about the correct section addresses while
relaxing, reassignment of addresses has been moved to a post-allocation
pass. Note that this is probably the time it had to be done in the
first place since in `notifyResolved` (where it was done before), all
symbols are supposed to be resolved already.
Depends on D159082
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D159089
When linker relaxation is enabled on RISC-V, every branch has a relocation and a
corresponding symbol in the symbol table. BOLT currently registers all these
symbols as secondary entry points causing almost every function to be marked as
multi entry on RISC-V.
This patch modifies `adjustFunctionBoundaries` to ignore these symbols.
Note that I currently try to detect them by checking if a symbol's name
starts with the private label prefix as defined by `MCAsmInfo`. Since
I'm not entirely sure what multi-entry functions look like on different
targets, please check if this condition is correct. Maybe it could make
sense to only check this on RISC-V?
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D159285
Reduce YAML profile processing times:
- preprocessProfile: speed up buildNameMaps by replacing ProfileNameToProfile
mapping with ProfileFunctionNames set and ProfileBFs vector.
Pre-look up YamlBF->BF correspondence, memoize in ProfileBFs.
- readProfile: replace iteration over all functions in the binary by iteration
over profile functions (strict match and LTO name match).
On a large binary (1.9M functions) and large YAML profile (121MB, 30k functions)
reduces profile steps runtime:
pre-process profile data: 12.4953s -> 10.7123s
process profile data: 9.8195s -> 5.6639s
Compared to fdata profile reading:
pre-process profile data: 8.0268s
process profile data: 1.0265s
process profile data pre-CFG: 0.1644s
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D159460
Two (minor) improvements for stale matching:
- always match entry blocks to each other, even if there is a hash mismatch;
- ignore nops in (loose) hash computation.
I record a small improvement in inference quality on my benchmarks. Tests are not affected
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D159488
Fix tests that are failing in cross-compilation after D151920
(https://lab.llvm.org/buildbot/#/builders/221/builds/17715):
- instrumentation-ind-call, basic-instrumentation: add -mno-outline-atomics flag to runtime lib
- bolt-address-translation-internal-call, internal-call-instrument: add %cflags
- meta-merge-fdata: restrict to x86_64
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D159094
Since the issue with trap value is fixed in D158191, it now should pass
on both platforms.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D158899
As discussed in D159266, for some instructions it's impossible to know
statically if they will load/store (e.g., predicated instructions).
Therefore, mayLoad/mayStore are more appropriate names.
`MCInstrDesc` provides the `mayLoad` and `mayStore` flags that seem
appropriate to use as a target-independent way to implement `isLoad` and
`isStore`.
I believe this is currently good enough to use for the RISC-V target as
well. I've provided a test for this that checks the generated dyno
stats (which seems to be the only thing both `isLoad` and `isStore` are
used for).
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D159266
Since the test executes instrumented version of the binary, move it under
runtime/X86. Note that it can be adjusted to also run under AArch64 now that
instrumentation is supported.
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D159298
Fine-tuning hash computation for stale matching:
- introducing a new "loose" basic block hash that allows to match many more blocks than before;
- tweaking params of the inference algorithm that find (slightly) better solutions;
- added more meaningful tests for stale matching.
Tested the changes on several open-source benchmarks (clang, rocksdb, chrome)
and one prod workload using different compiler modes (LTO/PGO etc). There is
always an improvement in the quality of inferred profiles.
(The current implementation is still not optimal but the diff is a step forward;
I am open to further suggestions)
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D156278
If `Itr` is the last element and then `std::next(Itr)` will be
`Range.end()`, so that the statement `std::next(Itr)->second` is
a UB.
Reviewed By: yota9, maksfb
Differential Revision: https://reviews.llvm.org/D159177
The relationship of X86 registers is shown in the diagram. BL and BH do
not have a direct alias relationship. However, if the BH register cannot be
swapped, then the BX/EBX/RBX registers cannot be swapped as well, which
means that BL register also cannot be swapped. Therefore, in the presence
of BX/EBX/RBX registers, BL and BH have an alias relationship.
┌────────────────┐
│ RBX │
├────┬───────────┤
│ │ EBX │
├────┴──┬────────┤
│ │ BX │
├───────┼───┬────┤
│ │BH │BL │
└───────┴───┴────┘
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D155098
BOLT uses `MCAsmLayout` to calculate the output values of functions and
basic blocks. This means output values are calculated based on a
pre-linking state and any changes to symbol values during linking will
cause incorrect values to be used.
This issue can be triggered by enabling linker relaxation on RISC-V.
Since linker relaxation can remove instructions, symbol values may
change. This causes, among other things, the symbol table created by
BOLT in the output executable to be incorrect.
This patch solves this issue by using `BOLTLinker` to get symbol values
instead of `MCAsmLayout`. This way, output values are calculated based
on a post-linking state. To make sure the linker can update all
necessary symbols, this patch also makes sure all these symbols are not
marked as temporary so that they end-up in the object file's symbol
table.
Note that this patch only deals with symbols of binary functions
(`BinaryFunction::updateOutputValues`). The technique described above
turned out to be too expensive for basic block symbols so those are
handled differently in D155604.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D154604
When parsing AddressMap and there is a conflict in keys,
where two entries share the same key, consider the first entry as the
correct one, instead of the last. This matches previous behavior in
BOLT and covers case such as BOLT creating a new basic block but
sharing the same input offset of the previous (or entry) basic
block. In this case, instead of translating debuginfo to use the newly
created BB, translate using the BB that was originally read from
input. This will increase our chances of getting debuginfo right.
Tested via binary comparison in tests:
X86/dwarf4-df-input-lowpc-ranges.test
X86/dwarf5-df-input-lowpc-ranges.test
Reviewed By: #bolt, maksfb, jobnoorman
Differential Revision: https://reviews.llvm.org/D158686
This commit adds support for AArch64 in instrumentation runtime library,
including AArch64 system calls.
Also this commit divides syscalls into target-specific files.
Reviewed By: rafauler, yota9
Differential Revision: https://reviews.llvm.org/D151942
This commit adds support for generation of getter counters for AArch64 MacOS.
Continuation of work D151899
Reviewed By: rafauleir, yota9
Differential Revision: https://reviews.llvm.org/D151901
This commit adds code generation for AArch64 instrumentation,
including direct and indirect calls support.
Reviewed By: rafauler, yota9
Differential Revision: https://reviews.llvm.org/D151899
The trap value used by BOLT was assumed to be single-byte instruction.
It made some functions unaligned on AArch64(e.g exceptions-instrumentation test)
and caused emission failures. Fix that by changing fill value to StringRef.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D158191
Because indirect call tables use static addresses for call sites, but pc
values recorded by runtime may be subject to ASLR in PIE, we couldn't
find indirect call descriptions by their runtime address in PIE. It
resulted in [unknown] entries in profile for all indirect calls. We need
to substract base address of .text from runtime addresses to get the
corresponding static addresses. Here we create a getter for base address
of .text and substract it's return value from recorded PC values. It
converts them to static addresses, which then may be used to find the
corresponding indirect call descriptions.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D154121
When a binary is instrumented with --instrumentation-sleep-time and
instrumentation-wait-forks options and lauched, the profile is
periodically written until all the forks die. The problem is that we
cannot wait for the whole process tree, and we have no way to tell when
it's safe to read the profile. Hovewer, if we keep profile open
throughout the life of the process tree, we can use fuser to determine
when writing is finished.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D154436
The implementation is based on the X86 version, with the same code
of symbol and addend extraction. The differences include the
support for RelType `R_AARCH64_CALL26` and the deletion of 8-bit
relocation.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D156018
This commit splits the createRelocation function for the X86 architecture
into two parts, retaining the first half and moving the second half to a
new function called extractFixupExpr. The purpose of this change is to make
extractFixupExpr a shared function between AArch64 and X86 architectures,
increasing code reusability and maintainability.
Child revision: https://reviews.llvm.org/D156018
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D157217
BOLT uses MCAsmLayout to calculate the output values of basic blocks.
This means output values are calculated based on a pre-linking state and
any changes to symbol values during linking will cause incorrect values
to be used.
This issue was first addressed in D154604 by adding all basic block
symbols to the symbol table for the linker to resolve them. However, the
runtime overhead of handling this huge symbol table turned out to be
prohibitively large.
This patch solves the issue in a different way. First, a temporary
section containing [input address, output symbol] pairs is emitted to the
intermediary object file. The linker will resolve all these references
so we end up with a section of [input address, output address] pairs.
This section is then parsed and used to:
- Replace BinaryBasicBlock::OffsetTranslationTable
- Replace BinaryFunction::InputOffsetToAddressMap
- Update BinaryBasicBlock::OutputAddressRange
Note that the reason this is more performant than the previous attempt
is that these symbol references do not cause entries to be added to the
symbol table. Instead, section-relative references are used for the
relocations.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D155604
I noticed that `-reorder-functions=exec-count` doesn't work as expected due to
a bug in the comparison function (which isn't symmetric). It is questionable
whether anyone would want to ever use the sorting method (as sorting by say
density is much better in all cases) but it is probably better to fix the bug.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D152959
Compiler can generate DIE References that are invalid. Previously BOLT could
assert when writing out IR to .debug_info. Changed where DIE offsets are changed
so that it's always done. Thus making sure that assert is not triggered.
Added more specific warnings, and ability to print out invalid referenced DIE
offset when verbosity >=1.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D157746
This bug crept in when CU partitioning was introduced. It manifests itself when
there are CUs that use location lists that come before CUs that are part of
thin-lto. BOLT processes CUs with cross CU references first (these are produced
by thin-lto). When we wrote out all the location lists we did it in original
order. Since DWARF4 uses offsets directly in to .debug_loc those offsets in DIEs
became wrong.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D157908