Commit Graph

237 Commits

Author SHA1 Message Date
Amir Ayupov
fd38366e45 [BOLT][NFC] Clean includes, add license headers (#87200) 2024-03-31 19:29:45 -07:00
Maksim Panchenko
6b1cf00400 [BOLT] Add support for Linux kernel static keys jump table (#86090)
Runtime code modification used by static keys is the most ubiquitous
self-modifying feature of the Linux kernel. The idea is to to eliminate
the condition check and associated conditional jump on a hot path if
that condition (based on a boolean value of a static key) does not
change often. Whenever they condition changes, the kernel runtime
modifies all code paths associated with that key flipping the code
between nop and (unconditional) jump.
2024-03-21 14:05:21 -07:00
Maksim Panchenko
bba790db47 [BOLT] Refactor instruction creation interface. NFCI (#85292)
Refactor MCPlusBuilder's create{Instruction}() functions that used to
return bool. We almost never check the return value as we rely on
llvm_unreachable() to detect unimplemented functionality. There were a
couple of cases that checked the return value, but they would hit the
unreachable condition first (at least in debug builds) before the return
value gets checked.
2024-03-14 13:17:17 -07:00
Mehdi Amini
4a4fb930a5 Use the new ThreadPoolInterface base class instead of the concrete implementation (NFC) (#84056) 2024-03-05 12:37:11 -08:00
Elvina Yakubova
b98e6a5ced [BOLT][AArch64] Skip BBs only instead of functions (#81989)
After [this
](846eb76761)
commit we noticed that the size of fdata file decreased a lot. That's
why the better and more precise way will be to skip basic blocks with
exclusive instructions only instead of the whole function
2024-02-27 19:19:47 +03:00
Amir Ayupov
52cf07116b [BOLT][NFC] Log through JournalingStreams (#81524)
Make core BOLT functionality more friendly to being used as a
library instead of in our standalone driver llvm-bolt. To
accomplish this, we augment BinaryContext with journaling streams
that are to be used by most BOLT code whenever something needs to
be logged to the screen. Users of the library can decide if logs
should be printed to a file, no file or to the screen, as
before. To illustrate this, this patch adds a new option
`--log-file` that allows the user to redirect BOLT logging to a
file on disk or completely hide it by using
`--log-file=/dev/null`. Future BOLT code should now use
`BinaryContext::outs()` for printing important messages instead of
`llvm::outs()`. A new test log.test enforces this by verifying that
no strings are print to screen once the `--log-file` option is
used.

In previous patches we also added a new BOLTError class to report
common and fatal errors, so code shouldn't call exit(1) now. To
easily handle problems as before (by quitting with exit(1)),
callers can now use
`BinaryContext::logBOLTErrorsAndQuitOnFatal(Error)` whenever code
needs to deal with BOLT errors. To test this, we have fatal.s
that checks we are correctly quitting and printing a fatal error
to the screen.

Because this is a significant change by itself, not all code was
yet ported. Code from Profiler libs (DataAggregator and friends)
still print errors directly to screen.

Co-authored-by: Rafael Auler <rafaelauler@fb.com>

Test Plan: NFC
2024-02-12 14:53:53 -08:00
Amir Ayupov
13d60ce2f2 [BOLT][NFC] Propagate BOLTErrors from Core, RewriteInstance, and passes (2/2) (#81523)
As part of the effort to refactor old error handling code that
would directly call exit(1), in this patch continue the migration
on libCore, libRewrite and libPasses to use the new BOLTError
class whenever a failure occurs.

Test Plan: NFC

Co-authored-by: Rafael Auler <rafaelauler@fb.com>
2024-02-12 14:51:15 -08:00
Amir Ayupov
fa7dd4919a [BOLT][NFC] Add BOLTError and return it from passes (1/2) (#81522)
As part of the effort to refactor old error handling code that
would directly call exit(1), in this patch we add a new class
BOLTError and auxiliary functions `createFatalBOLTError()` and
`createNonFatalBOLTError()` that allow BOLT code to bubble up the
problem to the caller by using the Error class as a return
type (or Expected). Also changes passes to use these.

Co-authored-by: Rafael Auler <rafaelauler@fb.com>

Test Plan: NFC
2024-02-12 14:39:59 -08:00
Amir Ayupov
a5f3d1a803 [BOLT][NFC] Return Error from BinaryFunctionPass::runOnFunctions (#81521)
As part of the effort to refactor old error handling code that
would directly call exit(1), in this patch we change the
interface to `BinaryFunctionPass` to return an Error on
`runOnFunctions()`. This gives passes the ability to report a
serious problem to the caller (RewriteInstance class), so the
caller may decide how to best handle the exceptional situation.

Co-authored-by: Rafael Auler <rafaelauler@fb.com>

Test Plan: NFC
2024-02-12 14:36:12 -08:00
Maksim Panchenko
7fe97f0420 [BOLT] Always run CheckLargeFunctions in non-relocation mode (#80922)
We run CheckLargeFunctions pass in non-relocation mode to prevent the
emission of functions that later could not be written to the output due
to their large size. The main reason behind the pass is to prevent the
emission of metadata for such functions since this metadata becomes
incorrect if the function is left unmodified.

Currently, the pass is enabled in non-relocation mode only when debug
info output is also enabled. As we emit increasingly more kinds of
metadata, e.g. for the Linux kernel, it becomes more challenging to
track metadata that needs to be fixed. Hence, I'm enabling the pass to
always run in non-relocation mode.
2024-02-08 14:21:49 -08:00
Maksim Panchenko
8ea7f1d20a [BOLT][NFCI] Keep instruction annotations (#80382)
We used to delete most instruction annotations before code emission. It
was done to release memory taken by annotations and to reduce overall
memory consumption. However, since the implementation of annotations has
moved to using existing instruction operands, the memory overhead
associated with them has reduced drastically. I measured that savings
are less than 0.5% on large binaries and processing time is just
slightly reduced if we keep them. Additionally, I plan to use
annotations in pre-emission passes for the Linux kernel rewriter.
2024-02-06 19:59:53 -08:00
Amir Ayupov
3c64b24ed3 [BOLT] Add extra staleness logging (#80225)
Report two extra metrics:
- # of stale functions with matching block count,
- # of stale blocks with matching instruction count.
2024-02-01 07:16:40 -08:00
spupyrev
9058503d26 [BOLT] Deprecate hfsort+ in favor of cdsort (#72408)
A new function sorting algorithm (cdsort) in LLVM is an optimized 
version of BOLT's hfsort+. In order to avoid code duplication and 
simplify maintenance, getting rid of hfsort+.

Perf-wise this is likely a neutral change, though differences on 
individual benchmarks are possible, since the generated function layout 
has changed. I tested cdsort vs hfsort+ on a number of open-source and 
prod binaries built in different modes and record an average neutral 
perf difference, perhaps with more "green" counters.
2024-01-26 06:51:55 -08:00
Amir Ayupov
e9309b27d7 [BOLT] Report input staleness (#79496)
It's beneficial to have uniform reporting in both `infer-stale-profile`
on and off cases, primarily for logging purposes.

Without this change, BOLT would report "input" staleness in
`infer-stale-profile=0` case (without matching), and "output" staleness
in `infer-stale-profile=1` case (after matching).

This change makes BOLT report "input" staleness in both cases. "Output"
staleness information is printed separately with "BOLT-INFO: inferred
profile..."
2024-01-25 14:15:13 -08:00
spupyrev
0daf303e79 [BOLT] Fix double conversion in CacheMetrics (#75253)
The change (i) fixes an issue with double-int conversion in CacheMetrics
and
(ii) removes command-line options for computing metrics (which aren't
modified
anyway).
This change might break some tests verifying the exact output of
CacheMetrics.
2024-01-12 10:27:12 -08:00
ShatianWang
1577483413 [BOLT] Don't split likely fallthrough in CDSplit (#76164)
This diff speeds up CDSplit by not considering any hot-warm splitting
point that could break a fall-through branch from a basic block to its
most likely successor.

Co-authored-by: spupyrev <spupyrev@fb.com>
2023-12-21 16:17:10 -05:00
Kazu Hirata
ad8fd5b185 [BOLT] Use StringRef::{starts,ends}_with (NFC)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
2023-12-13 23:34:49 -08:00
Kazu Hirata
1cc5431285 [BOLT] Fix warnings
This patch fixes:

  bolt/lib/Core/BinaryFunctionProfile.cpp:222:10: error: variable
  'BBMergeSI' set but not used [-Werror,-Wunused-but-set-variable]

  bolt/lib/Passes/VeneerElimination.cpp:67:12: error: variable
  'VeneerCallers' set but not used [-Werror,-Wunused-but-set-variable]
2023-12-11 12:55:29 -08:00
Amir Ayupov
b039ccc684 [BOLT] Provide backwards compatibility for YAML profile with std::hash (#74253)
Provide backwards compatibility for YAML profile that uses `std::hash`:
xxh3 hash is the default for newly produced profile (sets `std-hash:
false`),
whereas the profile that doesn't specify `std-hash` will be treated as
`std-hash: true`, preserving old behavior.
2023-12-11 12:27:32 -08:00
sinan
fdb13cf531 [BOLT] Fix local out-of-range stub issue in LongJmp (#73918)
If a local stub is out-of-range, at LongJmp we will try to find another
local stub first. However, The original implementation do not work as
expected and it leads to an infinite loop between replaceTargetWithStub
and fixBranches.

After this patch, we first convert the target of BB back to the target
of the local stub, and then look up for other valid local stubs and so
on.
2023-12-11 10:38:28 +08:00
Ho Cheung
fa5486e487 [BOLT] [Passes] Fix two compile warnings in BOLT (#73086)
Fix build issue on Windows.

issue:#73085

@maksfb PTAL thank you
2023-12-06 11:19:07 -08:00
ShatianWang
296088bdf3 [BOLT][NFC] Remove unused code for CDSplit (#74136)
This diff removes JumpInfo related code that is no longer needed by
CDSplit from SplitFunctions.cpp.
2023-12-01 15:21:30 -05:00
ShatianWang
4483cf2d8b [BOLT] CDSplit main logic part 2/2 (#74032)
This diff implements the main splitting logic of CDSplit. CDSplit
processes functions in a binary in parallel. For each function BF, it
assumes that all other functions are hot-cold split. For each possible
hot-warm split point of BF, it computes its corresponding SplitScore,
and chooses the split point with the best SplitScore. The SplitScore of
each split point is computed in the following way: each call edge or
jump edge has an edge score that is proportional to its execution count,
and inversely proportional to its distance. The SplitScore of a split
point is a sum of edge scores over a fixed set of edges whose distance
can change due to hot-warm splitting BF. This set contains all cover
calls in the form of X->Y or Y->X given function order [... X ... BF ...
Y ...]; we refer to the sum of edge scores over the set of cover calls
as CoverCallScore. This set also contains all jump edges (branches)
within BF as well as all call edges originated from BF; we refer to the
sum of edge scores over this set of edges as LocalScore. CDSplit finds
the split index maximizing CoverCallScore + LocalScore.
2023-11-30 23:17:11 -05:00
ShatianWang
56bbf8135e [BOLT] CDSplit main logic part 1/2 (#73895)
This diff defines and initializes auxiliary variables used by CDSplit
and implements two important helper functions. The first helper function
approximates the block level size increase if a function is hot-warm
split at a given split index (X86 specific). The second helper function
finds all calls in the form of X->Y or Y->X for each BF given function
order [... X ... BF ... Y ...]. These calls are referred to as "cover
calls". Their distance will decrease if BF's hot fragment size is
further reduced by hot-warm splitting. NFC.
2023-11-30 20:55:36 -05:00
ShatianWang
c43d0432ef [BOLT] Create .text.warm for 3-way splitting (#73863)
This commit explicitly adds a warm code section, .text.warm, when
-split-functions -split-strategy=cdsplit is used. This replaces the
previous approach of using .text.cold.0 as warm and .text.cold.1 as cold
in 3-way function splitting. NFC.
2023-11-29 22:42:36 -05:00
ShatianWang
076bd22f57 [BOLT] Add structure of CDSplit to SplitFunctions (#73430)
This commit establishes the general structure of the CDSplit strategy in
SplitFunctions without incorporating the exact splitting logic. With
-split-functions -split-strategy=cdsplit, the SplitFunctions pass will
run twice: the first time is before function reordering and functions
are hot-cold split; the second time is after function reordering and
functions are hot-warm-cold split based on the fixed function ordering.
Currently, all functions are hot-warm split after the entry block in the
second splitting pass. Subsequent commits will introduce the precise
splitting logic. NFC.
2023-11-29 15:43:21 -05:00
llongint
f3e54f2f97 [BOLT][NFC] Extract a function for dump MCInst (#67225)
In GDB debugging, obtaining the assembly representation of MCInst is
more intuitive.
2023-11-21 20:30:44 +08:00
Maksim Panchenko
f633f325a1 [BOLT] Fix NOP instruction emission on x86 (#72186)
Use MCAsmBackend::writeNopData() interface to emit NOP instructions on
x86. There are multiple forms of NOP instruction on x86 with different
sizes. Currently, LLVM's assembly/disassembly does not support all forms
correctly which can lead to a breakage of input code semantics, e.g. if
the program relies on NOP instructions for reserving a patch space.

Add "--keep-nops" option to preserve NOP instructions.
2023-11-13 18:12:39 -08:00
Maksim Panchenko
ec4a03c658 [BOLT] Enhance LowerAnnotations pass. NFCI. (#71847)
After #70147, all primary annotation types are stored directly in the
instruction and hence there's no need for the temporary storage we've
used previously for repopulating preserved annotations.
2023-11-12 19:34:42 -08:00
Vladislav Khmelevsky
6206817380 [BOLT][AArch64] Fix ADR relaxation (#71835)
Currently we have an optimization that if the ADR points to the same
function we might skip it's relaxation. But it doesn't take into account
that BF might be split, in such situation we still need to relax it. And
just in case also relax if the initial BF size is >= 1MB.
Fixes #71822
2023-11-10 11:48:03 +04:00
Vladislav Khmelevsky
abec50cb93 [BOLT][AArch64] Fix strict usage during ADR Relax (#71377)
Currently strict mode is used to expand number of optimized functions,
not to shrink it. Revert the option usage in the pass, so passing strict
option would relax adr instruction even if there are no nops around it.
Also add check for nop after adr instruction.
2023-11-10 11:46:36 +04:00
Vladislav Khmelevsky
c6c04a83a7 [BOLT] Run EliminateUnreachableBlocks in parallel (#71299)
The wall time for this pass decreased on my laptop from ~80 sec to 5
sec processing the clang.
2023-11-10 00:46:04 +04:00
spaette
1a2f83366b [BOLT] Fix typos (#68121)
Closes https://github.com/llvm/llvm-project/issues/63097

Before merging please make sure the change to
bolt/include/bolt/Passes/StokeInfo.h is correct.

bolt/include/bolt/Passes/StokeInfo.h

```diff
  //  This Pass solves the two major problems to use the Stoke program without
- //  proting its code:
+ //  probing its code:
```

I'm still not happy about the awkward wording in this comment.

bolt/include/bolt/Passes/FixRelaxationPass.h

```
$ ed -s bolt/include/bolt/Passes/FixRelaxationPass.h <<<'9,12p'
// This file declares the FixRelaxations class, which locates instructions with
// wrong targets and fixes them. Such problems usually occures when linker
// relaxes (changes) instructions, but doesn't fix relocations types properly
// for them.
$
```


bolt/docs/doxygen.cfg.in
bolt/include/bolt/Core/BinaryContext.h
bolt/include/bolt/Core/BinaryFunction.h
bolt/include/bolt/Core/BinarySection.h
bolt/include/bolt/Core/DebugData.h
bolt/include/bolt/Core/DynoStats.h
bolt/include/bolt/Core/Exceptions.h
bolt/include/bolt/Core/MCPlusBuilder.h
bolt/include/bolt/Core/Relocation.h
bolt/include/bolt/Passes/FixRelaxationPass.h
bolt/include/bolt/Passes/InstrumentationSummary.h
bolt/include/bolt/Passes/ReorderAlgorithm.h
bolt/include/bolt/Passes/StackReachingUses.h
bolt/include/bolt/Passes/StokeInfo.h
bolt/include/bolt/Passes/TailDuplication.h
bolt/include/bolt/Profile/DataAggregator.h
bolt/include/bolt/Profile/DataReader.h
bolt/lib/Core/BinaryContext.cpp
bolt/lib/Core/BinarySection.cpp
bolt/lib/Core/DebugData.cpp
bolt/lib/Core/DynoStats.cpp
bolt/lib/Core/Relocation.cpp
bolt/lib/Passes/Instrumentation.cpp
bolt/lib/Passes/JTFootprintReduction.cpp
bolt/lib/Passes/ReorderData.cpp
bolt/lib/Passes/RetpolineInsertion.cpp
bolt/lib/Passes/ShrinkWrapping.cpp
bolt/lib/Passes/TailDuplication.cpp
bolt/lib/Rewrite/BoltDiff.cpp
bolt/lib/Rewrite/DWARFRewriter.cpp
bolt/lib/Rewrite/RewriteInstance.cpp
bolt/lib/Utils/CommandLineOpts.cpp
bolt/runtime/instr.cpp
bolt/test/AArch64/got-ld64-relaxation.test
bolt/test/AArch64/unmarked-data.test
bolt/test/X86/Inputs/dwarf5-cu-no-debug-addr-helper.s
bolt/test/X86/Inputs/linenumber.cpp
bolt/test/X86/double-jump.test
bolt/test/X86/dwarf5-call-pc-function-null-check.test
bolt/test/X86/dwarf5-split-dwarf4-monolithic.test
bolt/test/X86/dynrelocs.s
bolt/test/X86/fallthrough-to-noop.test
bolt/test/X86/tail-duplication-cache.s
bolt/test/runtime/X86/instrumentation-ind-calls.s
2023-11-09 11:29:46 -08:00
Vladislav Khmelevsky
485075c095 [BOLT][AArch64] Don't change layout in PatchEntries (#71278)
Due to LongJmp pass that is executed before PatchEntries we can't ignore
the function here since it would change pre-calculated output layout.
The test reloc-26 relied on the wrong behavior, rewritten to unittest.
This is also attemp to fix #70771
2023-11-08 11:38:46 +04:00
Maksim Panchenko
0df154671b [BOLT] Use Label annotation instead of EHLabel pseudo. NFCI. (#70179)
When we need to attach EH label to an instruction, we can now use Label
annotation instead of EHLabel pseudo instruction.
2023-11-06 14:43:14 -08:00
maksfb
e28c393bd1 [BOLT] Reduce the number of emitted symbols. NFCI. (#70175)
We emit a symbol before an instruction for a number of reasons, e.g. for
tracking LocSyms, debug line, or if the instruction has a label
annotation. Currently, we may emit multiple symbols per instruction.

Reuse the same label instead of creating and emitting new ones when
possible. I'm planning to refactor EH labels as well in a separate diff.

Change getLabel() to return a pointer instead of std::optional<> since
an empty label should be treated identically to no label.
2023-11-06 11:41:47 -08:00
maksfb
7f031d1c7c [BOLT] Fix address mapping for ICP code (#70136)
When we create new code for indirect code promotion optimization, we
should mark it as originating from the indirect jump instruction for
BOLT address translation (BAT) to map it to the original instruction.
2023-11-06 11:25:49 -08:00
spupyrev
287fcd38a1 [BOLT] Rename cds to cdsort (#69966)
Unify naming for the layout algorithms by renaming "cds" to "cdsort".
This is
NFC unless someone is already using the new algorithm (which is
unlikely).
2023-11-02 12:46:36 -07:00
Kazu Hirata
f9306f6de3 [ADT] Rename llvm::erase_value to llvm::erase (NFC) (#70156)
C++20 comes with std::erase to erase a value from std::vector.  This
patch renames llvm::erase_value to llvm::erase for consistency with
C++20.

We could make llvm::erase more similar to std::erase by having it
return the number of elements removed, but I'm not doing that for now
because nobody seems to care about that in our code base.

Since there are only 50 occurrences of erase_value in our code base,
this patch replaces all of them with llvm::erase and deprecates
llvm::erase_value.
2023-10-24 23:03:13 -07:00
Kazu Hirata
e1a584305e [BOLT] Use llvm::is_contained (NFC) 2023-10-19 23:21:58 -07:00
Vladislav Khmelevsky
b7944f7c04 [BOLT] Return proper minimal alignment from BF (#67707)
Currently minimal alignment of function is hardcoded to 2 bytes.
Add 2 more cases:
1. In case BF is data in code return the alignment of CI as minimal
alignment
2. For aarch64 and riscv platforms return the minimal value of 4 (added
test for aarch64)
Otherwise fallback to returning the 2 as it previously was.
2023-10-12 09:33:08 +04:00
Job Noorman
43e9eae6e8 [BOLT] Preserve label annotations for injected functions (#68713)
Needed for instrumentation on RISC-V.
2023-10-11 07:26:20 +00:00
qijitao
bae41ff57e [BOLT] Fix long jump negative offset issue. (#67132)
In instruction encoding, the relative offset address of the PC is
signed, that is, the number of positive offset bits and the number of
negative offset bits is asymmetric. Therefore, the maximum and minimum
values are used to replace Mask to determine the boundary.

Co-authored-by: qijitao <qijitao@hisilicon.com>
2023-10-08 01:06:10 +04:00
Job Noorman
ff5e2babcb [BOLT] Improve handling of relocations targeting specific instructions (#66395)
On RISC-V, there are certain relocations that target a specific
instruction instead of a more abstract location like a function or basic
block. Take the following example that loads a value from symbol `foo`:

```
nop
1: auipc t0, %pcrel_hi(foo)
ld t0, %pcrel_lo(1b)(t0)
```

This results in two relocation:
- auipc: `R_RISCV_PCREL_HI20` referencing `foo`;
- ld: `R_RISCV_PCREL_LO12_I` referencing to local label `1` which points
to the auipc instruction.

It is of utmost importance that the `R_RISCV_PCREL_LO12_I` keeps
referring to the auipc instruction; if not, the program will fail to
assemble. However, BOLT currently does not guarantee this.

BOLT currently assumes that all local symbols are jump targets and
always starts a new basic block at symbol locations. The example above
results in a CFG the looks like this:

```
.BB0:
    nop
.BB1:
    auipc t0, %pcrel_hi(foo)
    ld t0, %pcrel_lo(.BB1)(t0)
```

While this currently works (i.e., the `R_RISCV_PCREL_LO12_I` relocation
points to the correct instruction), it has two downsides:
- Too many basic blocks are created (the example above is logically only
  one yet two are created);
- If instructions are inserted in `.BB1` (e.g., by instrumentation),
  things will break since the label will not point to the auipc anymore.

This patch proposes to fix this issue by teaching BOLT to track labels
that should always point to a specific instruction. This is implemented
as follows:
- Add a new annotation type (`kLabel`) that allows us to annotate
  instructions with an `MCSymbol *`;
- Whenever we encounter a relocation type that is used to refer to a
  specific instruction (`Relocation::isInstructionReference`), we
  register it without a symbol;
- During disassembly, whenever we encounter an instruction with such a
  relocation, create a symbol for its target and store it in an offset
  to symbol map (to ensure multiple relocations referencing the same
  instruction use the same label);
- After disassembly, iterate this map to attach labels to instructions
  via the new annotation type;
- During emission, emit these labels right before the instruction.

I believe the use of annotations works quite well for this use case as
it allows us to reliably track instruction labels. If we were to store
them as offsets in basic blocks, it would be error prone to keep them
updated whenever instructions are inserted or removed.

I have chosen to add labels as first-class annotations (as opposed to a
generic one) because the documentation of `MCAnnotation` suggests that
generic annotations are to be used for optional metadata that can be
discarded without affecting correctness. As this is not the case for
labels, a first-class annotation seemed more appropriate.
2023-10-06 06:46:16 +00:00
Job Noorman
7fa33773e3 [BOLT][RISCV] Handle long tail calls (#67098)
Long tail calls use the following instruction sequence on RISC-V:

```
1: auipc xi, %pcrel_hi(sym)
jalr zero, %pcrel_lo(1b)(xi)
```

Since the second instruction in isolation looks like an indirect branch,
this confused BOLT and most functions containing a long tail call got
marked with "unknown control flow" and didn't get optimized as a
consequence.

This patch fixes this by detecting long tail call sequence in
`analyzeIndirectBranch`. `FixRISCVCallsPass` also had to be updated to
expand long tail calls to `PseudoTAIL` instead of `PseudoCALL`.

Besides this, this patch also fixes a minor issue with compressed tail
calls (`c.jr`) not being detected.

Note that I had to change `BinaryFunction::postProcessIndirectBranches`
slightly: the documentation of `MCPlusBuilder::analyzeIndirectBranch`
mentions that the [`Begin`, `End`) range contains the instructions
immediately preceding `Instruction`. However, in
`postProcessIndirectBranches`, *all* the instructions in the BB where
passed in the range. This made it difficult to find the preceding
instruction so I made sure *only* the preceding instructions are passed.
2023-10-05 08:55:30 +00:00
Vladislav Khmelevsky
f99bd29610 [BOLT][NFC] Run ADRRelaxationPass in parallel (#67831)
To do this:
1. Protect BC.Ctx with mutex
2. Don't call exit from thread, please check the reason comment near
PassFailed variable definition. The other option would be call _Exit
instead of exit, but I think we shall call destructors properly.
2023-09-30 13:47:41 +04:00
Vladislav Khmelevsky
08086c1529 [BOLT][AArch64] Fix CI alignment
Fix alignment calculation for CI.

Differential Revision: https://reviews.llvm.org/D159548
2023-09-28 12:55:57 +04:00
Vladislav Khmelevsky
846eb76761 [BOLT][AArch64] Fix instrumentation deadloop
According to ARMv8-a architecture reference manual B2.10.5 software
must avoid having any explicit memory accesses between exclusive load
and associated store instruction. Otherwise exclusive monitor might
clear the exclusivity without application-related cause which may
result in the deadloop. Disable instrumentation for such functions,
since between exclusive load and store there might be branches and we
would insert instrumentation snippet which contains loads and stores.

The better solution would be to analyze with BFS finding the exact BBs
between load and store and not instrumenting them. Or even better to
recognize such sequences and replace them with more complex one, e.g.
loading value non exclusively, and for the brach where exclusive store
is made make exclusive load and store sequentially, but for now just
disable instrumentation for such functions completely.

Differential Revision: https://reviews.llvm.org/D159520
2023-09-22 00:58:01 +04:00
Fangrui Song
6b8d04c23d [CodeLayout] Refactor std::vector uses, namespace, and EdgeCountT. NFC
* Place types and functions in the llvm::codelayout namespace
* Change EdgeCountT from pair<pair<uint64_t, uint64_t>, uint64_t> to a struct and utilize structured bindings.
  It is not conventional to use the "T" suffix for structure types.
* Remove a redundant copy in ChainT::merge.
* Change {ExtTSPImpl,CDSortImpl}::run to use return value instead of an output parameter
* Rename applyCDSLayout to computeCacheDirectedLayout: (a) avoid rare
  abbreviation "CDS" (cache-directed sort) (b) "compute" is more conventional
  for the specific use case
* Change the parameter types from std::vector to ArrayRef so that
  SmallVector arguments can be used.
* Similarly, rename applyExtTspLayout to computeExtTspLayout.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D159526
2023-09-21 13:13:03 -07:00
Job Noorman
dc925be68b [BOLT][RISCV] Carry-over annotations when fixing calls (#66763)
`FixRISCVCallsPass` changes all different forms of calls to `PseudoCALL`
instructions. However, the original call's annotations were lost in the
process.

This patch fixes this by moving all annotations from the old to the new
call. `MCPlusBuilder::moveAnnotations` had to be made public for this.
2023-09-21 06:37:47 +00:00