Commit Graph

351 Commits

Author SHA1 Message Date
Nathan Sidwell
4308c7422d [BOLT][NFC] Refactor relocation arch selection (#87829)
Convert the relocation routines to switch on architecture and have an explicit unreachable default.
2024-04-08 09:01:28 -04:00
Amir Ayupov
fd38366e45 [BOLT][NFC] Clean includes, add license headers (#87200) 2024-03-31 19:29:45 -07:00
Amir Ayupov
d12e45ad16 [BOLT][NFC] Split out DomTree construction from BF::calculateLoopInfo (#87181) 2024-03-31 06:24:19 -07:00
Amir Ayupov
c0febca3a6 [BOLT][NFC] Refactor BC::createBinaryContext for #81346 (#87172) 2024-03-30 20:43:23 -07:00
Maksim Panchenko
7de82ca369 [BOLT] Don't terminate on trap instruction for Linux kernel (#87021)
Under normal circumstances, we terminate basic blocks on a trap
instruction. However, Linux kernel may resume execution after hitting a
trap (ud2 on x86). Thus, we introduce "--terminal-trap" option that will
specify if the trap instruction should terminate the control flow. The
option is on by default except for the Linux kernel mode when it's off.
2024-03-29 16:41:15 -07:00
Amir Ayupov
d8fe2e4bb0 [BOLT] Fix enumeration of secondary entry points
Make them start with 1 instead of 0 (reserved for primary entry point).

Test Plan:
```
bin/llvm-lit -a tools/bolt/test/X86/yaml-secondary-entry-discriminator.s
```

Reviewers: rafaelauler, ayermolo, maksfb, dcci

Reviewed By: maksfb

Pull Request: https://github.com/llvm/llvm-project/pull/86848
2024-03-27 15:23:49 -07:00
Alexander Yermolovich
f3cfe016c5 [BOLT][DWARF] Add support for cross-cu references for debug-names (#86015)
The DW_AT_abstract_origin can be a cross-cu reference as a by-product of
LTO. On IR level for absolute references an address is stored, vs a DIE
for relative references. Added a map to keep track of cross-cu
referenced DIEs to use when we add an Entry.
2024-03-22 13:48:49 -07:00
Maksim Panchenko
6b1cf00400 [BOLT] Add support for Linux kernel static keys jump table (#86090)
Runtime code modification used by static keys is the most ubiquitous
self-modifying feature of the Linux kernel. The idea is to to eliminate
the condition check and associated conditional jump on a hot path if
that condition (based on a boolean value of a static key) does not
change often. Whenever they condition changes, the kernel runtime
modifies all code paths associated with that key flipping the code
between nop and (unconditional) jump.
2024-03-21 14:05:21 -07:00
Alexander Yermolovich
4841858862 [BOLT][DWARF] Add support to debug_names for DW_AT_abstract_origin/DW_AT_specification (#85485)
According to the DWARF spec a DIE that has DW_AT_specification or
DW_AT_abstract_origin can be part of .debug_name if a DIE those
attribute points to has DW_AT_name or DW_AT_linkage_name.
2024-03-18 15:28:01 -07:00
Alexander Yermolovich
a4610c7182 [BOLT][DWARF] Add support for DW_IDX_parent (#85285)
This adds support for DW_IDX_parent. If DIE has a parent then
DW_IDX_parent in Entry will point to Entry for that parent DIE.
Otherwise it will have DW_FORM_flag_present in abbrev. Which takes zero
space in Entry.

This came from

https://discourse.llvm.org/t/rfc-improve-dwarf-5-debug-names-type-lookup-parsing-speed/74151
2024-03-15 13:52:45 -07:00
Alexander Yermolovich
6d4aa9d70e [BOLT][DWWARF] Fix foreign TU index with local TUs (#84594)
The foreign TU list immediately follows the local TU list and they both
use the same index, so that if there are N local TU entries, the index
for the first foreign TU is N.

Changed so that the size of local TU is accounted for when setting
foreign TU index.
2024-03-11 12:20:25 -07:00
Mehdi Amini
716042a63f Rename llvm::ThreadPool -> llvm::DefaultThreadPool (NFC) (#83702)
The base class llvm::ThreadPoolInterface will be renamed
llvm::ThreadPool in a subsequent commit.

This is a breaking change: clients who use to create a ThreadPool must
now create a DefaultThreadPool instead.
2024-03-05 18:00:46 -08:00
Mehdi Amini
4a4fb930a5 Use the new ThreadPoolInterface base class instead of the concrete implementation (NFC) (#84056) 2024-03-05 12:37:11 -08:00
sinan
71c2a132b2 [BOLT] support AArch64 JUMP26 createRelocation (#83531)
Add R_AARCH64_JUMP26 implementation for createRelocation, which
could significantly reduce the number of failed scan-refs cases if we
perform bolt on a selective range of functions.
2024-03-04 17:11:47 +08:00
Maksim Panchenko
d7d564b2fc [BOLT] Add BinaryFunction::registerBranch(). NFC (#83337)
Add an external interface to register a branch in a function that is in
disassembled state. Allows to make custom modifications to the
disassembler. E.g., a pre-CFG pass can add an instruction and register a
branch that will later be used during the CFG construction.
2024-02-28 20:04:28 -08:00
Maksim Panchenko
3f2a9e5910 [BOLT] Sort TakenBranches immediately before use. NFCI (#83333)
Move code that sorts TakenBranches right before the branches are used.
We can populate TakenBranches in pre-CFG post-processing and hence have
to postpone the sorting to a later point in the processing pipeline.
Will add such a pass later. For now it's NFC.
2024-02-28 19:51:44 -08:00
Maksim Panchenko
7c206c7812 [BOLT] Refactor interface for instruction labels. NFCI (#83209)
To avoid accidentally setting the label twice for the same instruction,
which can lead to a "lost" label, introduce getOrSetInstLabel()
function. Rename existing functions to getInstLabel()/setInstLabel() to
make it explicit that they operate on instruction labels. Add an
assertion in setInstLabel() that the instruction did not have a prior
label set.
2024-02-27 18:44:28 -08:00
Alexander Yermolovich
6de5fcc746 [BOLT][DWARF] Add support for .debug_names (#81062)
DWARF5 spec supports the .debug_names acceleration table. This is the
formalized version of combination of gdb-index/pubnames/types. Added
implementation of it to BOLT. It supports both monolothic and split
dwarf, with and without Type Units. It does not include parent indices.
This will be in followup PR. Unlike LLVM output this will put all the
CUs and TUs into one Module.
2024-02-26 14:00:31 -08:00
Alexander Yermolovich
004c1972b4 [BOLT][DWARF][NFC] Expose DebugStrOffsetsWriter::clear (#82548)
Refactored cod that clears data-structures in DebugStrOffsetsWriter into
clear() function and made initialize() public. This is for
https://github.com/llvm/llvm-project/pull/81062.
2024-02-21 16:48:02 -08:00
Alexander Yermolovich
640e781dc8 [BOLT][DWARF][NFC] Use SkeletonCU in place of IsDWO check (#82540)
Changed isDWO to a function that checks Skeleton CU that is passed in.
This is for preparation for
https://github.com/llvm/llvm-project/pull/81062.
2024-02-21 16:18:18 -08:00
Maksim Panchenko
5daf2001a1 [BOLT] Fix memory leak in BinarySection (#82520)
The change in #80950 exposed a memory leak in BinarySection. Let
BinarySection manage memory passed via updateContents() unless a valid
SectionID is set indicating that the contents are managed by JITLink.
2024-02-21 11:54:34 -08:00
Alexander Yermolovich
c9e8e91aca [BOLT][DWARF] Fix out of order rangelists/loclists (#81645)
GCC can generate rangelists/loclists that are out of order. Fixed so
that we don't assert, and instead generate partially optimized list.
Through most code paths we do sort rnglists/loclists, but not for
loclist for a path where BOLT does not modify a function. Although it's
nice to have lists sorted, this implementation shouldn't rely on it.
This also fixes an issue if we partially capture a list we would write
out *end_of_list in helper function. So tools won't see the rest of the
addresses being written out.
2024-02-14 11:23:57 -08:00
Amir Ayupov
52cf07116b [BOLT][NFC] Log through JournalingStreams (#81524)
Make core BOLT functionality more friendly to being used as a
library instead of in our standalone driver llvm-bolt. To
accomplish this, we augment BinaryContext with journaling streams
that are to be used by most BOLT code whenever something needs to
be logged to the screen. Users of the library can decide if logs
should be printed to a file, no file or to the screen, as
before. To illustrate this, this patch adds a new option
`--log-file` that allows the user to redirect BOLT logging to a
file on disk or completely hide it by using
`--log-file=/dev/null`. Future BOLT code should now use
`BinaryContext::outs()` for printing important messages instead of
`llvm::outs()`. A new test log.test enforces this by verifying that
no strings are print to screen once the `--log-file` option is
used.

In previous patches we also added a new BOLTError class to report
common and fatal errors, so code shouldn't call exit(1) now. To
easily handle problems as before (by quitting with exit(1)),
callers can now use
`BinaryContext::logBOLTErrorsAndQuitOnFatal(Error)` whenever code
needs to deal with BOLT errors. To test this, we have fatal.s
that checks we are correctly quitting and printing a fatal error
to the screen.

Because this is a significant change by itself, not all code was
yet ported. Code from Profiler libs (DataAggregator and friends)
still print errors directly to screen.

Co-authored-by: Rafael Auler <rafaelauler@fb.com>

Test Plan: NFC
2024-02-12 14:53:53 -08:00
Amir Ayupov
13d60ce2f2 [BOLT][NFC] Propagate BOLTErrors from Core, RewriteInstance, and passes (2/2) (#81523)
As part of the effort to refactor old error handling code that
would directly call exit(1), in this patch continue the migration
on libCore, libRewrite and libPasses to use the new BOLTError
class whenever a failure occurs.

Test Plan: NFC

Co-authored-by: Rafael Auler <rafaelauler@fb.com>
2024-02-12 14:51:15 -08:00
Amir Ayupov
fa7dd4919a [BOLT][NFC] Add BOLTError and return it from passes (1/2) (#81522)
As part of the effort to refactor old error handling code that
would directly call exit(1), in this patch we add a new class
BOLTError and auxiliary functions `createFatalBOLTError()` and
`createNonFatalBOLTError()` that allow BOLT code to bubble up the
problem to the caller by using the Error class as a return
type (or Expected). Also changes passes to use these.

Co-authored-by: Rafael Auler <rafaelauler@fb.com>

Test Plan: NFC
2024-02-12 14:39:59 -08:00
Maksim Panchenko
8075f0db16 [BOLT] Use new contents when emitting sections with relocations (#80782)
We can use BinarySection::updateContents() to change section contents.
However, if we also add relocations for new contents, then the original
data (i.e. not updated) is going to be used. Fix that. A follow-up diff
will use the update interface and will include a test case.
2024-02-06 14:38:21 -08:00
Alexander Yermolovich
7d272722fb [BOLT][DWARF] Add option to specify DW_AT_comp_dir (#79395)
Added an --comp-dir-override option that overrides DW_AT_comp_dir in the
unit die. This allows for llvm-bolt to be invoked from any category and
still find .dwo files.
2024-01-25 15:00:52 -08:00
Amir Ayupov
9fec33aadc Revert "[BOLT] Fix unconditional output of boltedcollection in merge-fdata (#78653)"
This reverts commit 82bc33ea3f.

Accidentally pushed unrelated changes.
2024-01-18 19:59:09 -08:00
Amir Ayupov
82bc33ea3f [BOLT] Fix unconditional output of boltedcollection in merge-fdata (#78653)
Fix the bug where merge-fdata unconditionally outputs boltedcollection 
line, regardless of whether input files have it set.

Test Plan:
Added bolt/test/X86/merge-fdata-nobat-mode.test which fails without this
fix.
2024-01-18 19:44:16 -08:00
Alexander Yermolovich
ad4cead67c [BOLT][DWARF][NFC] Initialize CloneUnitCtxMap with current partition size (#75876)
We would always allocate maximum amount for vector containing
DWARFUnitInfo. In real usecases what ends up hapenning is we allocate a
giant vector when processing one CU, or for thin-lto case multiple CUs.
This lead to a lot of memory overhead, and 2x BOLT processing slowdown
for at least one service built with monolithic DWARF.

For binaries built with LTO with clang all of CUs that have cross
references will share an abbrev table and will be processed in one
batch. Rest of CUs are processesd in --cu-processing-batch-size size.
Which defaults to 1.

For theoretical cases where cross-cu references are present, but they do
not share abbrev will increase the size of CloneUnitCtxMap as each CU is
being processsed.
2023-12-20 16:12:52 -08:00
Alexander Yermolovich
bf2b035e58 [BOLT][DWARF] Fix handling .debug_str_offsets for type units (#75522)
There was an assumpiton that TUs and CUs share .debug_str_offsets
contribution. For ThinLTO builds it is not the case. Changed so that we
parse contributions for TUs also, and did some refactoring so that we
don't re-parse contributions that were not modified.
2023-12-14 17:27:21 -08:00
Kazu Hirata
ad8fd5b185 [BOLT] Use StringRef::{starts,ends}_with (NFC)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
2023-12-13 23:34:49 -08:00
Alexander Yermolovich
fb9a851224 [BOLT][DWARF] Fix handling of debug_str_offsets (#75100)
We were not setting size field of .debug_str_offsets correctly. Fixed
it, and added a test.
2023-12-11 15:56:32 -08:00
Kazu Hirata
1cc5431285 [BOLT] Fix warnings
This patch fixes:

  bolt/lib/Core/BinaryFunctionProfile.cpp:222:10: error: variable
  'BBMergeSI' set but not used [-Werror,-Wunused-but-set-variable]

  bolt/lib/Passes/VeneerElimination.cpp:67:12: error: variable
  'VeneerCallers' set but not used [-Werror,-Wunused-but-set-variable]
2023-12-11 12:55:29 -08:00
Amir Ayupov
b039ccc684 [BOLT] Provide backwards compatibility for YAML profile with std::hash (#74253)
Provide backwards compatibility for YAML profile that uses `std::hash`:
xxh3 hash is the default for newly produced profile (sets `std-hash:
false`),
whereas the profile that doesn't specify `std-hash` will be treated as
`std-hash: true`, preserving old behavior.
2023-12-11 12:27:32 -08:00
Nathan Sidwell
9596676e65 [BOLT] Determine address size from binary (#74870)
Query the executable for address size.
2023-12-09 14:39:57 -05:00
ShatianWang
56bbf8135e [BOLT] CDSplit main logic part 1/2 (#73895)
This diff defines and initializes auxiliary variables used by CDSplit
and implements two important helper functions. The first helper function
approximates the block level size increase if a function is hot-warm
split at a given split index (X86 specific). The second helper function
finds all calls in the form of X->Y or Y->X for each BF given function
order [... X ... BF ... Y ...]. These calls are referred to as "cover
calls". Their distance will decrease if BF's hot fragment size is
further reduced by hot-warm splitting. NFC.
2023-11-30 20:55:36 -05:00
Maksim Panchenko
4f3081296f [BOLT][NFC] Fix comment (#73983)
Fix off-by-one error in comment.
2023-11-30 14:31:38 -08:00
ShatianWang
c43d0432ef [BOLT] Create .text.warm for 3-way splitting (#73863)
This commit explicitly adds a warm code section, .text.warm, when
-split-functions -split-strategy=cdsplit is used. This replaces the
previous approach of using .text.cold.0 as warm and .text.cold.1 as cold
in 3-way function splitting. NFC.
2023-11-29 22:42:36 -05:00
Maksim Panchenko
4bcbbe1f70 [BOLT] Refactor fixBranches() (#73752)
Simplify code in fixBranches(). Mostly NFC, accept the x86-specific
check for code fragments now takes into account presence of more than
two fragments. Should only matter when we split code into multiple
fragments and can run fixBranches() more than once.

Also, don't replace a branch target with the same one, as such operation
may allocate memory for extra MCSymbolRefExpr.
2023-11-29 16:24:16 -08:00
Alexander Yermolovich
00dbea7c73 [BOLT][DWARF][NFC] Added const to variable (#73731)
Nit followup to 72729.
2023-11-28 17:30:28 -08:00
Alexander Yermolovich
b47b3bee7b [BOLT][DWARF] Fix handling of DWARF5 DWP (#72729)
Fixed handling of DWP as input. Before BOLT crashed. Now it will write
out
correct CU, and all the TUs. Potential future improvement is to scan all
the TUs
used in this CU, and only include those.
2023-11-28 15:54:14 -08:00
spupyrev
e7dd596c68 [BOLT] Use deterministic xxh3 for computing BF/BB hashes (#72542)
std::hash and ADT/Hashing::hash_value are non-deterministic functions
whose
results might vary across implementation/process/execution. Using xxh3
instead
for computing hashes of BinaryFunctions and BinaryBasicBlock for stale
profile
matching.
(A possible alternative is to use ADT/StableHashing.h based on FNV
hashing but
xxh3 seems to be more popular in LLVM)

This is to address https://github.com/llvm/llvm-project/issues/65241.
2023-11-27 14:45:46 -08:00
Maksim Panchenko
f4834255d3 [BOLT] Reset output addresses for deleted blocks (#73429)
This is a follow-up to #73076. We need to reset output addresses for
deleted blocks, otherwise the address translation may mistakenly
attribute input address of a deleted block to a non-zero address.

While working on a test case, I've discovered that DWARF output ranges
were already broken for deleted basic blocks: #73428. I will provide a
test case for this PR with a DWARF address range fix PR.
2023-11-25 23:23:47 -08:00
Maksim Panchenko
365114292a [BOLT][NFC] Refactor function state check (#73420)
Remove redundant check in updateOutputValues().
2023-11-25 21:09:54 -08:00
ShatianWang
d333c0e062 [BOLT] Extend calculateEmittedSize() for block size calculation (#73076)
This commit modifies BinaryContext::calculateEmittedSize() to update 
the BinaryBasicBlock::OutputAddressRange of each basic block in the
function in place. BinaryBasicBlock::getOutputSize() now gives the 
emitted size of the basic block.
2023-11-23 15:28:31 -05:00
llongint
f3e54f2f97 [BOLT][NFC] Extract a function for dump MCInst (#67225)
In GDB debugging, obtaining the assembly representation of MCInst is
more intuitive.
2023-11-21 20:30:44 +08:00
Maksim Panchenko
84602066a6 [BOLT] Fix C++ exceptions when LPStart is specified (#72737)
Whenever LPStartEncoding was different from DW_EH_PE_omit, we used to
miscalculate LPStart. As a result, landing pads were assigned wrong
addresses. Fix that.
2023-11-20 20:55:38 -08:00
Maksim Panchenko
f653f6d57a [BOLT][NFC] Delete unused declarations (#72596) 2023-11-16 23:36:19 -08:00
JohnLee1243
ae51ec84bb [Bolt] Solving pie support issue (#65494)
Now PIE is default supported after clang 14. It cause parsing error when
using perf2bolt. The reason is the base address can not get correctly.
Fix the method of geting base address. If SegInfo.Alignment is not equal
to pagesize, alignDown(SegInfo.FileOffset, SegInfo.Alignment) can not
equal to FileOffset. So the SegInfo.FileOffset and FileOffset should be
aligned by SegInfo.Alignment first and then judge whether they are
equal.
The .text segment's offset from base address in VAS is aligned by
pagesize. So MMapAddress's offset from base address is
alignDown(SegInfo.Address, pagesize) instead of
alignDown(SegInfo.Address, SegInfo.Alignment). So the base address
calculate way should be changed.

Co-authored-by: Li Zhuohang <lizhuohang3@huawei.com>
2023-11-16 15:05:06 +08:00