Commit Graph

1949 Commits

Author SHA1 Message Date
ShatianWang
1577483413 [BOLT] Don't split likely fallthrough in CDSplit (#76164)
This diff speeds up CDSplit by not considering any hot-warm splitting
point that could break a fall-through branch from a basic block to its
most likely successor.

Co-authored-by: spupyrev <spupyrev@fb.com>
2023-12-21 16:17:10 -05:00
Alexander Yermolovich
ad4cead67c [BOLT][DWARF][NFC] Initialize CloneUnitCtxMap with current partition size (#75876)
We would always allocate maximum amount for vector containing
DWARFUnitInfo. In real usecases what ends up hapenning is we allocate a
giant vector when processing one CU, or for thin-lto case multiple CUs.
This lead to a lot of memory overhead, and 2x BOLT processing slowdown
for at least one service built with monolithic DWARF.

For binaries built with LTO with clang all of CUs that have cross
references will share an abbrev table and will be processed in one
batch. Rest of CUs are processesd in --cu-processing-batch-size size.
Which defaults to 1.

For theoretical cases where cross-cu references are present, but they do
not share abbrev will increase the size of CloneUnitCtxMap as each CU is
being processsed.
2023-12-20 16:12:52 -08:00
Jon Roelofs
d6f772074c fixup! fixup! [GlobalISel] Always direct-call IFuncs and Aliases (#74902)
Apparently some BOLT bots build with a pre-installed system clang, and others
use the just-built one. These two clangs now behave slightly differently when
it comes to ifunc codegen after https://github.com/llvm/llvm-project/pull/74902

Change the test to accept both patterns.
2023-12-15 12:48:11 -07:00
Jon Roelofs
3017adb37e fixup! [GlobalISel] Always direct-call IFuncs and Aliases (#74902)
The codegen change broke one of the BOLT tests.
2023-12-15 12:17:07 -07:00
Wang Yaduo
c532ba4edd [RISCV] Support printing immediate of RISCV MCInst in hexadecimal format (#74053)
Enable the llvm-objdump to disassemble the immediate of RISCV
instruction in hexadecimal format with --print-imm-hex flag.
2023-12-14 22:42:11 -08:00
Vitaly Buka
fc3adf74d3 Revert "[RISCV] Support printing immediate of RISCV MCInst in hexadecimal format" (#75561)
Reverts llvm/llvm-project#74053

Breaks https://lab.llvm.org/buildbot/#/builders/5/builds/39291

Co-authored-by: Wang Yaduo <wangyaduo@linux.alibaba.com>

Issue #75563
2023-12-14 22:05:47 -08:00
Wang Yaduo
3dde0d0256 [RISCV] Support printing immediate of RISCV MCInst in hexadecimal format (#74053)
Enable the llvm-objdump to disassemble the immediate of RISCV
instruction in hexadecimal format with --print-imm-hex flag.
2023-12-15 10:13:20 +08:00
Alexander Yermolovich
bf2b035e58 [BOLT][DWARF] Fix handling .debug_str_offsets for type units (#75522)
There was an assumpiton that TUs and CUs share .debug_str_offsets
contribution. For ThinLTO builds it is not the case. Changed so that we
parse contributions for TUs also, and did some refactoring so that we
don't re-parse contributions that were not modified.
2023-12-14 17:27:21 -08:00
Kazu Hirata
ad8fd5b185 [BOLT] Use StringRef::{starts,ends}_with (NFC)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
2023-12-13 23:34:49 -08:00
Rafael Auler
a26aa79a3b [BOLT] Fix some dwarf tests affected by 75095 (#75327)
PR 75095 introduced some changes to lld that broke some dwarf tests that
were being incorrectly linked as a PIE. Add flags to disable any PIC/PIE
compilation, so the linker can succeed and the tests can run as
intended.
2023-12-13 06:11:15 -08:00
Alexander Yermolovich
fb9a851224 [BOLT][DWARF] Fix handling of debug_str_offsets (#75100)
We were not setting size field of .debug_str_offsets correctly. Fixed
it, and added a test.
2023-12-11 15:56:32 -08:00
Kazu Hirata
1cc5431285 [BOLT] Fix warnings
This patch fixes:

  bolt/lib/Core/BinaryFunctionProfile.cpp:222:10: error: variable
  'BBMergeSI' set but not used [-Werror,-Wunused-but-set-variable]

  bolt/lib/Passes/VeneerElimination.cpp:67:12: error: variable
  'VeneerCallers' set but not used [-Werror,-Wunused-but-set-variable]
2023-12-11 12:55:29 -08:00
Amir Ayupov
b039ccc684 [BOLT] Provide backwards compatibility for YAML profile with std::hash (#74253)
Provide backwards compatibility for YAML profile that uses `std::hash`:
xxh3 hash is the default for newly produced profile (sets `std-hash:
false`),
whereas the profile that doesn't specify `std-hash` will be treated as
`std-hash: true`, preserving old behavior.
2023-12-11 12:27:32 -08:00
sinan
fdb13cf531 [BOLT] Fix local out-of-range stub issue in LongJmp (#73918)
If a local stub is out-of-range, at LongJmp we will try to find another
local stub first. However, The original implementation do not work as
expected and it leads to an infinite loop between replaceTargetWithStub
and fixBranches.

After this patch, we first convert the target of BB back to the target
of the local stub, and then look up for other valid local stubs and so
on.
2023-12-11 10:38:28 +08:00
Nathan Sidwell
9596676e65 [BOLT] Determine address size from binary (#74870)
Query the executable for address size.
2023-12-09 14:39:57 -05:00
Ho Cheung
fa5486e487 [BOLT] [Passes] Fix two compile warnings in BOLT (#73086)
Fix build issue on Windows.

issue:#73085

@maksfb PTAL thank you
2023-12-06 11:19:07 -08:00
sinan
b304873134 [BOLT] Fix a wrong compiler option in test (#74420)
-nopie is an option for OpenBSD, and other linux distribution might
report an `unsupported option '-nopie' for target` error.
2023-12-06 17:16:48 +08:00
eleviant
f20af7372f [bolt] Support arm64 FP register spills (#73021)
At the moment llvm-bolt fails when analyzing jump tables on aarch64 in
case FP register spill/reload is used.
2023-12-05 20:32:58 +01:00
ShatianWang
296088bdf3 [BOLT][NFC] Remove unused code for CDSplit (#74136)
This diff removes JumpInfo related code that is no longer needed by
CDSplit from SplitFunctions.cpp.
2023-12-01 15:21:30 -05:00
Amir Ayupov
9584f58344 [BOLT][utils] Bump default time threshold to 2s in nfc-stat-parser 2023-12-01 09:57:48 -08:00
Amir Ayupov
76a9ea1321 [BOLT][utils] Remove heatmap mode detection from wrapper script
Heatmap mode has been moved to a separate tool. Drop the support in
llvm-bolt-wrapper.
2023-12-01 09:57:48 -08:00
ShatianWang
4483cf2d8b [BOLT] CDSplit main logic part 2/2 (#74032)
This diff implements the main splitting logic of CDSplit. CDSplit
processes functions in a binary in parallel. For each function BF, it
assumes that all other functions are hot-cold split. For each possible
hot-warm split point of BF, it computes its corresponding SplitScore,
and chooses the split point with the best SplitScore. The SplitScore of
each split point is computed in the following way: each call edge or
jump edge has an edge score that is proportional to its execution count,
and inversely proportional to its distance. The SplitScore of a split
point is a sum of edge scores over a fixed set of edges whose distance
can change due to hot-warm splitting BF. This set contains all cover
calls in the form of X->Y or Y->X given function order [... X ... BF ...
Y ...]; we refer to the sum of edge scores over the set of cover calls
as CoverCallScore. This set also contains all jump edges (branches)
within BF as well as all call edges originated from BF; we refer to the
sum of edge scores over this set of edges as LocalScore. CDSplit finds
the split index maximizing CoverCallScore + LocalScore.
2023-11-30 23:17:11 -05:00
ShatianWang
56bbf8135e [BOLT] CDSplit main logic part 1/2 (#73895)
This diff defines and initializes auxiliary variables used by CDSplit
and implements two important helper functions. The first helper function
approximates the block level size increase if a function is hot-warm
split at a given split index (X86 specific). The second helper function
finds all calls in the form of X->Y or Y->X for each BF given function
order [... X ... BF ... Y ...]. These calls are referred to as "cover
calls". Their distance will decrease if BF's hot fragment size is
further reduced by hot-warm splitting. NFC.
2023-11-30 20:55:36 -05:00
Maksim Panchenko
4f3081296f [BOLT][NFC] Fix comment (#73983)
Fix off-by-one error in comment.
2023-11-30 14:31:38 -08:00
Alexander Yermolovich
52be47b890 [BOLT][DWARF] Add support to create path (#73884)
When option --dwarf-output-path is specified, if the path does not exist
BOLT will now create it. This is what also happens when
--plugin-opt=dwo_dir=<value> is specified to LLD.
2023-11-30 09:41:01 -08:00
ShatianWang
c43d0432ef [BOLT] Create .text.warm for 3-way splitting (#73863)
This commit explicitly adds a warm code section, .text.warm, when
-split-functions -split-strategy=cdsplit is used. This replaces the
previous approach of using .text.cold.0 as warm and .text.cold.1 as cold
in 3-way function splitting. NFC.
2023-11-29 22:42:36 -05:00
Maksim Panchenko
4bcbbe1f70 [BOLT] Refactor fixBranches() (#73752)
Simplify code in fixBranches(). Mostly NFC, accept the x86-specific
check for code fragments now takes into account presence of more than
two fragments. Should only matter when we split code into multiple
fragments and can run fixBranches() more than once.

Also, don't replace a branch target with the same one, as such operation
may allocate memory for extra MCSymbolRefExpr.
2023-11-29 16:24:16 -08:00
ShatianWang
076bd22f57 [BOLT] Add structure of CDSplit to SplitFunctions (#73430)
This commit establishes the general structure of the CDSplit strategy in
SplitFunctions without incorporating the exact splitting logic. With
-split-functions -split-strategy=cdsplit, the SplitFunctions pass will
run twice: the first time is before function reordering and functions
are hot-cold split; the second time is after function reordering and
functions are hot-warm-cold split based on the fixed function ordering.
Currently, all functions are hot-warm split after the entry block in the
second splitting pass. Subsequent commits will introduce the precise
splitting logic. NFC.
2023-11-29 15:43:21 -05:00
Maksim Panchenko
0acfe8483a [BOLT][DWARF] Fix output ranges for deleted code (#73464)
Set range low_pc to 0 for DIEs that correspond to deleted code.

Fixes #73428
2023-11-28 22:40:53 -08:00
Alexander Yermolovich
00dbea7c73 [BOLT][DWARF][NFC] Added const to variable (#73731)
Nit followup to 72729.
2023-11-28 17:30:28 -08:00
Alexander Yermolovich
b47b3bee7b [BOLT][DWARF] Fix handling of DWARF5 DWP (#72729)
Fixed handling of DWP as input. Before BOLT crashed. Now it will write
out
correct CU, and all the TUs. Potential future improvement is to scan all
the TUs
used in this CU, and only include those.
2023-11-28 15:54:14 -08:00
Amir Ayupov
202dda8e5c [BOLT][utils] Bump default time threshold to 1s in nfc-stat-parser 2023-11-28 08:55:30 -08:00
Amir Ayupov
af4d8d5af6 [BOLT][test] Update perf2bolt/perf_test.test (#73482) 2023-11-28 07:00:07 -08:00
spupyrev
e7dd596c68 [BOLT] Use deterministic xxh3 for computing BF/BB hashes (#72542)
std::hash and ADT/Hashing::hash_value are non-deterministic functions
whose
results might vary across implementation/process/execution. Using xxh3
instead
for computing hashes of BinaryFunctions and BinaryBasicBlock for stale
profile
matching.
(A possible alternative is to use ADT/StableHashing.h based on FNV
hashing but
xxh3 seems to be more popular in LLVM)

This is to address https://github.com/llvm/llvm-project/issues/65241.
2023-11-27 14:45:46 -08:00
Amir Ayupov
ab14eb23b6 [BOLT][test] Replace /dev/null with temp file (#73485)
NFC processing time script identifies tests by output filename.
When `/dev/null` is used as output filename, we're unable to tell the
source test, and the reports are unhelpful.
Replace `/dev/null/` with `%t.null` which resolves the issue.
2023-11-27 10:53:18 -08:00
Maksim Panchenko
f4834255d3 [BOLT] Reset output addresses for deleted blocks (#73429)
This is a follow-up to #73076. We need to reset output addresses for
deleted blocks, otherwise the address translation may mistakenly
attribute input address of a deleted block to a non-zero address.

While working on a test case, I've discovered that DWARF output ranges
were already broken for deleted basic blocks: #73428. I will provide a
test case for this PR with a DWARF address range fix PR.
2023-11-25 23:23:47 -08:00
Maksim Panchenko
365114292a [BOLT][NFC] Refactor function state check (#73420)
Remove redundant check in updateOutputValues().
2023-11-25 21:09:54 -08:00
ShatianWang
d333c0e062 [BOLT] Extend calculateEmittedSize() for block size calculation (#73076)
This commit modifies BinaryContext::calculateEmittedSize() to update 
the BinaryBasicBlock::OutputAddressRange of each basic block in the
function in place. BinaryBasicBlock::getOutputSize() now gives the 
emitted size of the basic block.
2023-11-23 15:28:31 -05:00
Ho Cheung
3af586f797 [BOLT] Fix type mismatch error (#73016)
Fix build issue on Windows.

Fixes #73006
2023-11-21 19:13:46 -08:00
llongint
f3e54f2f97 [BOLT][NFC] Extract a function for dump MCInst (#67225)
In GDB debugging, obtaining the assembly representation of MCInst is
more intuitive.
2023-11-21 20:30:44 +08:00
Maksim Panchenko
84602066a6 [BOLT] Fix C++ exceptions when LPStart is specified (#72737)
Whenever LPStartEncoding was different from DW_EH_PE_omit, we used to
miscalculate LPStart. As a result, landing pads were assigned wrong
addresses. Fix that.
2023-11-20 20:55:38 -08:00
Maksim Panchenko
445f6f1373 [BOLT][TEST] Remove LTO flag from a test (#72896)
The LTO flag is not needed for the test to work properly. However, it
may not build on a system where compiler and linker versions don't match
one another. Remove the LTO flag.
2023-11-20 10:24:34 -08:00
Maksim Panchenko
f653f6d57a [BOLT][NFC] Delete unused declarations (#72596) 2023-11-16 23:36:19 -08:00
JohnLee1243
ae51ec84bb [Bolt] Solving pie support issue (#65494)
Now PIE is default supported after clang 14. It cause parsing error when
using perf2bolt. The reason is the base address can not get correctly.
Fix the method of geting base address. If SegInfo.Alignment is not equal
to pagesize, alignDown(SegInfo.FileOffset, SegInfo.Alignment) can not
equal to FileOffset. So the SegInfo.FileOffset and FileOffset should be
aligned by SegInfo.Alignment first and then judge whether they are
equal.
The .text segment's offset from base address in VAS is aligned by
pagesize. So MMapAddress's offset from base address is
alignDown(SegInfo.Address, pagesize) instead of
alignDown(SegInfo.Address, SegInfo.Alignment). So the base address
calculate way should be changed.

Co-authored-by: Li Zhuohang <lizhuohang3@huawei.com>
2023-11-16 15:05:06 +08:00
Vladislav Khmelevsky
5b59540661 [BOLT] Enhance fixed indirect branch handling (#71324)
Previously HasFixedIndirectBranch was set in BF to set isSimple to false
later because of unreachable bb ellimination pass which might remove the
BB with it's symbols accessed by other instructions than calls. It seems
to be that better solution would be to add extra entry point on target
offset instead of marking BF as non-simple.
2023-11-16 09:30:55 +04:00
Vladislav Khmelevsky
c5a306f07e [BOLT] Fix LSDA section handling (#71821)
Currently BOLT finds LSDA secition by it's name .gcc_except_table.main .
But sometimes it might have suffix e.g. .gcc_except_table.main. Find
LSDA section by it's address, rather by it's name.
Fixes #71804
2023-11-15 23:21:50 +04:00
Maksim Panchenko
e823136d43 [BOLT] Refactor --keep-nops option. NFC. (#72228)
Run RemoveNops pass only if --keep-nops is set to false (default).
2023-11-14 11:28:13 -08:00
Maksim Panchenko
f633f325a1 [BOLT] Fix NOP instruction emission on x86 (#72186)
Use MCAsmBackend::writeNopData() interface to emit NOP instructions on
x86. There are multiple forms of NOP instruction on x86 with different
sizes. Currently, LLVM's assembly/disassembly does not support all forms
correctly which can lead to a breakage of input code semantics, e.g. if
the program relies on NOP instructions for reserving a patch space.

Add "--keep-nops" option to preserve NOP instructions.
2023-11-13 18:12:39 -08:00
Maksim Panchenko
2db9b6a93f [BOLT] Make instruction size a first-class annotation (#72167)
When NOP instructions are used to reserve space in the code, e.g. for
patching, it becomes critical to preserve their original size while
emitting the code. On x86, we rely on "Size" annotation for NOP
instructions size, as the original instruction size is lost in the
disassembly/assembly process.

This change makes instruction size a first-class annotation and is
affectively NFCI. A follow-up diff will use the annotation for code
emission.
2023-11-13 14:33:39 -08:00
Maksim Panchenko
ec4a03c658 [BOLT] Enhance LowerAnnotations pass. NFCI. (#71847)
After #70147, all primary annotation types are stored directly in the
instruction and hence there's no need for the temporary storage we've
used previously for repopulating preserved annotations.
2023-11-12 19:34:42 -08:00