Commit Graph

113 Commits

Author SHA1 Message Date
Amir Ayupov
fd38366e45 [BOLT][NFC] Clean includes, add license headers (#87200) 2024-03-31 19:29:45 -07:00
Maksim Panchenko
7de82ca369 [BOLT] Don't terminate on trap instruction for Linux kernel (#87021)
Under normal circumstances, we terminate basic blocks on a trap
instruction. However, Linux kernel may resume execution after hitting a
trap (ud2 on x86). Thus, we introduce "--terminal-trap" option that will
specify if the trap instruction should terminate the control flow. The
option is on by default except for the Linux kernel mode when it's off.
2024-03-29 16:41:15 -07:00
Maksim Panchenko
6b1cf00400 [BOLT] Add support for Linux kernel static keys jump table (#86090)
Runtime code modification used by static keys is the most ubiquitous
self-modifying feature of the Linux kernel. The idea is to to eliminate
the condition check and associated conditional jump on a hot path if
that condition (based on a boolean value of a static key) does not
change often. Whenever they condition changes, the kernel runtime
modifies all code paths associated with that key flipping the code
between nop and (unconditional) jump.
2024-03-21 14:05:21 -07:00
Maksim Panchenko
49b8a99a0f [BOLT] Add createCondBranch() and createLongUncondBranch() (#85315)
Add MCPlusBuilder interface for creating two new branch types.
2024-03-14 15:28:22 -07:00
Maksim Panchenko
bba790db47 [BOLT] Refactor instruction creation interface. NFCI (#85292)
Refactor MCPlusBuilder's create{Instruction}() functions that used to
return bool. We almost never check the return value as we rely on
llvm_unreachable() to detect unimplemented functionality. There were a
couple of cases that checked the return value, but they would hit the
unreachable condition first (at least in debug builds) before the return
value gets checked.
2024-03-14 13:17:17 -07:00
Maksim Panchenko
59ab86bb2f [BOLT] Clear operands when creating new instructions. NFCI (#85191)
Reset operand list whenever we create a new instruction via a parameter
passed by reference. Most functions were already doing this, but there
are several places missing the reset. Potentially, if we don not clear
the list it could lead to invalid instruction operands. But the existing
code is unaffected.
2024-03-14 11:00:08 -07:00
sinan
71c2a132b2 [BOLT] support AArch64 JUMP26 createRelocation (#83531)
Add R_AARCH64_JUMP26 implementation for createRelocation, which
could significantly reduce the number of failed scan-refs cases if we
perform bolt on a selective range of functions.
2024-03-04 17:11:47 +08:00
Elvina Yakubova
b98e6a5ced [BOLT][AArch64] Skip BBs only instead of functions (#81989)
After [this
](846eb76761)
commit we noticed that the size of fdata file decreased a lot. That's
why the better and more precise way will be to skip basic blocks with
exclusive instructions only instead of the whole function
2024-02-27 19:19:47 +03:00
Amir Ayupov
52cf07116b [BOLT][NFC] Log through JournalingStreams (#81524)
Make core BOLT functionality more friendly to being used as a
library instead of in our standalone driver llvm-bolt. To
accomplish this, we augment BinaryContext with journaling streams
that are to be used by most BOLT code whenever something needs to
be logged to the screen. Users of the library can decide if logs
should be printed to a file, no file or to the screen, as
before. To illustrate this, this patch adds a new option
`--log-file` that allows the user to redirect BOLT logging to a
file on disk or completely hide it by using
`--log-file=/dev/null`. Future BOLT code should now use
`BinaryContext::outs()` for printing important messages instead of
`llvm::outs()`. A new test log.test enforces this by verifying that
no strings are print to screen once the `--log-file` option is
used.

In previous patches we also added a new BOLTError class to report
common and fatal errors, so code shouldn't call exit(1) now. To
easily handle problems as before (by quitting with exit(1)),
callers can now use
`BinaryContext::logBOLTErrorsAndQuitOnFatal(Error)` whenever code
needs to deal with BOLT errors. To test this, we have fatal.s
that checks we are correctly quitting and printing a fatal error
to the screen.

Because this is a significant change by itself, not all code was
yet ported. Code from Profiler libs (DataAggregator and friends)
still print errors directly to screen.

Co-authored-by: Rafael Auler <rafaelauler@fb.com>

Test Plan: NFC
2024-02-12 14:53:53 -08:00
Amir Ayupov
13d60ce2f2 [BOLT][NFC] Propagate BOLTErrors from Core, RewriteInstance, and passes (2/2) (#81523)
As part of the effort to refactor old error handling code that
would directly call exit(1), in this patch continue the migration
on libCore, libRewrite and libPasses to use the new BOLTError
class whenever a failure occurs.

Test Plan: NFC

Co-authored-by: Rafael Auler <rafaelauler@fb.com>
2024-02-12 14:51:15 -08:00
Maksim Panchenko
082fe9a5dd [BOLT] Remove duplicate expression (#80380)
Reported by cpp check static analyzer in #80111.

Fixes #80111.
2024-02-01 19:05:11 -08:00
eleviant
f20af7372f [bolt] Support arm64 FP register spills (#73021)
At the moment llvm-bolt fails when analyzing jump tables on aarch64 in
case FP register spill/reload is used.
2023-12-05 20:32:58 +01:00
Maksim Panchenko
0df154671b [BOLT] Use Label annotation instead of EHLabel pseudo. NFCI. (#70179)
When we need to attach EH label to an instruction, we can now use Label
annotation instead of EHLabel pseudo instruction.
2023-11-06 14:43:14 -08:00
Vladislav Khmelevsky
888742a121 [BOLT][AArch64] Handle .plt.got section (#71216)
It seems that currently this section is only created by the mold linker
if 2 conditions are met: 1. The PLT function was called directly. 2. The
indirect access to PLT function was found (e.g. through ADRP
relocation). Although mold created symbol for every plt entry I've
removed them in yaml file to check that .plt.got was truly disassembled
by bolt.
2023-11-04 00:47:24 +04:00
Job Noorman
b6b492880f [BOLT][RISCV] Set minimum function alignment to 2 for RVC (#69837)
In #67707, the minimum function alignment on RISC-V was set to 4. When
RVC (compressed instructions) is enabled, the minimum alignment can be
reduced to 2.

This patch implements this by delegating the choice of minimum alignment
to a new `MCPlusBuilder::getMinFunctionAlignment` function. This way,
the target-dependent code in `BinaryFunction` is minimized.
2023-10-23 08:09:11 +00:00
Job Noorman
3ab536fb99 [BOLT][RISCV] Implement getCalleeSavedRegs (#69161)
The main reason for implementing this now is to ensure the
`assume=abi.test` test passes on RISC-V. Since it uses
`--indirect-call-promotion=all`, it requires some support for register
analysis on the target.

Further testing and implementation of register/frame analysis on RISC-V
will come later.
2023-10-16 08:52:56 +00:00
Job Noorman
d8de38b401 [BOLT][RISCV] Handle EH_LABEL operands (#68998)
Fixes the `runtime/exceptions-no-pie.cpp` test on RISC-V.
2023-10-16 08:29:28 +00:00
Job Noorman
5c0931727e [BOLT][RISCV] Implement MCPlusBuilder::equals (#68989)
This enables ICF for RISC-V.

No tests are added by this commit as `bolt-icf.test` covers this case
(only on a RISC-V host though).
2023-10-16 07:13:07 +00:00
Job Noorman
8fb83bf5f1 [BOLT][NFC] Add MCSubtargetInfo to MCPlusBuilder (#68223)
On RISC-V, it's helpful to have access to `MCSubtargetInfo` while
generating instructions in `MCPlusBuilder`. For example, a return
instruction might be generated differently based on if the target
supports compressed instructions (`c.jr ra`) or not (`jalr ra`).
2023-10-06 06:39:58 +00:00
Job Noorman
7fa33773e3 [BOLT][RISCV] Handle long tail calls (#67098)
Long tail calls use the following instruction sequence on RISC-V:

```
1: auipc xi, %pcrel_hi(sym)
jalr zero, %pcrel_lo(1b)(xi)
```

Since the second instruction in isolation looks like an indirect branch,
this confused BOLT and most functions containing a long tail call got
marked with "unknown control flow" and didn't get optimized as a
consequence.

This patch fixes this by detecting long tail call sequence in
`analyzeIndirectBranch`. `FixRISCVCallsPass` also had to be updated to
expand long tail calls to `PseudoTAIL` instead of `PseudoCALL`.

Besides this, this patch also fixes a minor issue with compressed tail
calls (`c.jr`) not being detected.

Note that I had to change `BinaryFunction::postProcessIndirectBranches`
slightly: the documentation of `MCPlusBuilder::analyzeIndirectBranch`
mentions that the [`Begin`, `End`) range contains the instructions
immediately preceding `Instruction`. However, in
`postProcessIndirectBranches`, *all* the instructions in the BB where
passed in the range. This made it difficult to find the preceding
instruction so I made sure *only* the preceding instructions are passed.
2023-10-05 08:55:30 +00:00
Job Noorman
c7d6d62252 [BOLT][RISCV] Implement TLS le/ie relocations (#67112)
Handle the following relocations related to TLS local-exec and
initial-exec:
- R_RISCV_TLS_GOT_HI20
- R_RISCV_TPREL_HI20
- R_RISCV_TPREL_ADD
- R_RISCV_TPREL_LO12_I
- R_RISCV_TPREL_LO12_S

In addition, GNU ld has a quirk where after TLS le relaxation, two
unofficial relocation types may be emitted:
- R_RISCV_TPREL_I
- R_RISCV_TPREL_S

Since they are unofficial (defined in the reserved range of relocation
types), LLVM does not define them. Hence, I've defined them locally in
BOLT in a private namespace.
2023-10-05 08:53:51 +00:00
Rafael Auler
853e126ce3 [BOLT] Support input binaries that use R_X86_GOTPC64
In large code model, the address of GOT is calculated by the
static linker via R_X86_GOTPC64 reloc applied against a MOVABSQ
instruction. In the final binary, it can be disassembled as a regular
immediate, but because such immediate is the result of PC-relative
pointer arithmetic, we need to parse this relocation and update this
calculation whenever we move code, otherwise we break the code trying
to read GOT.

A test case showing how GOT is accessed was provided.

Reviewed By: #bolt, maksfb

Differential Revision: https://reviews.llvm.org/D158911
2023-10-02 23:12:44 -07:00
Job Noorman
9555736ac6 [BOLT][RISCV] Implement LO/HI relocations (#67444)
Implement the following relocations used by the medlow code model and
non-PIE binaries:
- R_RISCV_HI20
- R_RISCV_LO12_I
- R_RISCV_LO12_S
2023-09-26 15:54:11 +00:00
Kepontry
2d902d0f88 [BOLT] Implement '--assume-abi' option for AArch64
This patch implements the `getCalleeSavedRegs` function for AArch64,
addressing the issue where the "not implemented" error occurs when
both the `--assume-abi` option and options related to the
RegAnalysis Pass (e.g., `--indirect-call-promotion=all`) are enabled.
2023-09-25 21:55:29 +08:00
Vladislav Khmelevsky
846eb76761 [BOLT][AArch64] Fix instrumentation deadloop
According to ARMv8-a architecture reference manual B2.10.5 software
must avoid having any explicit memory accesses between exclusive load
and associated store instruction. Otherwise exclusive monitor might
clear the exclusivity without application-related cause which may
result in the deadloop. Disable instrumentation for such functions,
since between exclusive load and store there might be branches and we
would insert instrumentation snippet which contains loads and stores.

The better solution would be to analyze with BFS finding the exact BBs
between load and store and not instrumenting them. Or even better to
recognize such sequences and replace them with more complex one, e.g.
loading value non exclusively, and for the brach where exclusive store
is made make exclusive load and store sequentially, but for now just
disable instrumentation for such functions completely.

Differential Revision: https://reviews.llvm.org/D159520
2023-09-22 00:58:01 +04:00
Job Noorman
c5ba61978c [BOLT][RISCV] Add support for linker relaxation
Calls on RISC-V are typically compiled to `auipc`/`jalr` pairs to allow
a maximum target range (32-bit pc-relative). In order to optimize calls
to near targets, linker relaxation may replace those pairs with, for
example, single `jal` instructions.

To allow BOLT to freely reassign function addresses in relaxed binaries,
this patch proposes the following approach:
- Expand all relaxed calls back to `auipc`/`jalr`;
- Rely on JITLink to relax those back to shorter forms where possible.

This is implemented by detecting all possible call instructions and
replacing them with `PseudoCALL` (or `PseudoTAIL`) instructions. The
RISC-V backend then expands those and adds the necessary relocations for
relaxation.

Since BOLT generally ignores pseudo instruction, this patch makes
`MCPlusBuilder::isPseudo` virtual so that `RISCVMCPlusBuilder` can
override it to exclude `PseudoCALL` and `PseudoTAIL`.

To ensure JITLink knows about the correct section addresses while
relaxing, reassignment of addresses has been moved to a post-allocation
pass. Note that this is probably the time it had to be done in the
first place since in `notifyResolved` (where it was done before), all
symbols are supposed to be resolved already.

Depends on D159082

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D159089
2023-09-15 11:57:28 +02:00
Job Noorman
1b78742e77 [BOLT][RISCV] Implement R_RISCV_PCREL_LO12_S (#65204)
Relocation used for store instructions.
2023-09-09 08:22:37 +00:00
Job Noorman
eafe4ee2e8 [BOLT] Rename isLoad/isStore to mayLoad/mayStore
As discussed in D159266, for some instructions it's impossible to know
statically if they will load/store (e.g., predicated instructions).
Therefore, mayLoad/mayStore are more appropriate names.
2023-09-01 09:36:05 +02:00
Elvina Yakubova
70405a0bf7 [BOLT][Instrumentation] Add support for MacOS counters
This commit adds support for generation of getter counters for AArch64 MacOS.
Continuation of work D151899

Reviewed By: rafauleir, yota9

Differential Revision: https://reviews.llvm.org/D151901
2023-08-24 19:34:57 +03:00
Elvina Yakubova
6e4c230525 [BOLT][Instrumentation] Initial instrumentation support for AArch64
This commit adds code generation for AArch64 instrumentation,
including direct and indirect calls support.

Reviewed By: rafauler, yota9

Differential Revision: https://reviews.llvm.org/D151899
2023-08-24 19:34:57 +03:00
Denis Revunov
28fd2ca142 [BOLT] Fix trap value for non-X86
The trap value used by BOLT was assumed to be single-byte instruction.
It made some functions unaligned on AArch64(e.g exceptions-instrumentation test)
and caused emission failures. Fix that by changing fill value to StringRef.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D158191
2023-08-24 01:29:41 +03:00
zhoujiapeng
62020a3a7e [BOLT] Implement createRelocation for AArch64
The implementation is based on the X86 version, with the same code
of symbol and addend extraction. The differences include the
support for RelType `R_AARCH64_CALL26` and the deletion of 8-bit
relocation.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D156018
2023-08-23 00:53:32 +08:00
zhoujiapeng
9fee2ac044 [BOLT][NFC] Split createRelocation in X86 and share the second part
This commit splits the createRelocation function for the X86 architecture
into two parts, retaining the first half and moving the second half to a
new function called extractFixupExpr. The purpose of this change is to make
extractFixupExpr a shared function between AArch64 and X86 architectures,
increasing code reusability and maintainability.

Child revision: https://reviews.llvm.org/D156018

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D157217
2023-08-23 00:29:25 +08:00
Job Noorman
b6556dc9fe [BOLT][RISCV] Fix implementation of getTargetSymbol
- Correctly handle OpNum == 0 (auto select operand)
- Implement MCExpr overload

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D153343
2023-06-21 10:21:00 +02:00
Job Noorman
41b8aed499 [BOLT][RISCV] Implement branch reversal
Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D153344
2023-06-21 10:21:00 +02:00
Job Noorman
5e67ae151e [BOLT][RISCV] Implement return/unconditional branch creation
Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D153342
2023-06-21 10:21:00 +02:00
Job Noorman
f873029386 [BOLT] Add minimal RISC-V 64-bit support
Just enough features are implemented to process a simple "hello world"
executable and produce something that still runs (including libc calls).
This was mainly a matter of implementing support for various
relocations. Currently, the following are handled:

- R_RISCV_JAL
- R_RISCV_CALL
- R_RISCV_CALL_PLT
- R_RISCV_BRANCH
- R_RISCV_RVC_BRANCH
- R_RISCV_RVC_JUMP
- R_RISCV_GOT_HI20
- R_RISCV_PCREL_HI20
- R_RISCV_PCREL_LO12_I
- R_RISCV_RELAX
- R_RISCV_NONE

Executables linked with linker relaxation will probably fail to be
processed. BOLT relocates .text to a high address while leaving .plt at
its original (low) address. This causes PC-relative PLT calls that were
relaxed to a JAL to not fit their offset in an I-immediate anymore. This
is something that will be addressed in a later patch.

Changes to the BOLT core are relatively minor. Two things were tricky to
implement and needed slightly larger changes. I'll explain those below.

The R_RISCV_CALL(_PLT) relocation is put on the first instruction of a
AUIPC/JALR pair, the second does not get any relocation (unlike other
PCREL pairs). This causes issues with the combinations of the way BOLT
processes binaries and the RISC-V MC-layer handles relocations:
- BOLT reassembles instructions one by one and since the JALR doesn't
  have a relocation, it simply gets copied without modification;
- Even though the MC-layer handles R_RISCV_CALL properly (adjusts both
  the AUIPC and the JALR), it assumes the immediates of both
  instructions are 0 (to be able to or-in a new value). This will most
  likely not be the case for the JALR that got copied over.

To handle this difficulty without resorting to RISC-V-specific hacks in
the BOLT core, a new binary pass was added that searches for
AUIPC/JALR pairs and zeroes-out the immediate of the JALR.

A second difficulty was supporting ABS symbols. As far as I can tell,
ABS symbols were not handled at all, causing __global_pointer$ to break.
RewriteInstance::analyzeRelocation was updated to handle these
generically.

Tests are provided for all supported relocations. Note that in order to
test the correct handling of PLT entries, an ELF file produced by GCC
had to be used. While I tried to strip the YAML representation, it's
still quite large. Any suggestions on how to improve this would be
appreciated.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D145687
2023-06-16 12:19:36 +02:00
Maksim Panchenko
5c4d306a10 [BOLT][NFC] Change signature of MCPlusBuilder::isUnsupportedBranch()
Make MCPlusBuilder::isUnsupportedBranch() take MCInst, not opcode.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D152765
2023-06-13 12:20:36 -07:00
Maksim Panchenko
43f56a2f27 [BOLT] Fix handling of code references from unmodified code
In lite mode (default for X86), BOLT optimizes and relocates functions
with profile. The rest of the code is preserved, but if it references
relocated code such references have to be updated. The update is handled
by scanExternalRefs() function. Note that we cannot solely rely on
relocations written by the linker, as not all code references are
exposed to the linker. Additionally, the linker can modify certain
instructions and relocations will no longer match the code.

With this change, start using symbolic disassembler for scanning code
for references in scanExternalRefs(). Unlike the previous approach, the
symbolizer properly detects and creates references for instructions with
multiple/ambiguous symbolic operands and handles cases where a
relocation doesn't match any operand. See test cases for examples.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D152631
2023-06-12 10:46:51 -07:00
Shengchen Kan
3f1e9468f6 [X86][MC][bolt] Share code between encoding optimization and assembler relaxation, NFCI
PUSH[16|32|64]i[8|32] are not arithmetic instructions, so I renamed the
functions.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D151028
2023-05-21 09:31:50 +08:00
Shengchen Kan
89ca4eb002 [X86][NFC] Correct the instruction names for PUSH16i, PUSH32i
Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D151012
2023-05-20 17:33:42 +08:00
Amir Ayupov
b6f07d3ae8 [BOLT][NFC] Add MCPlusBuilder defOperands/useOperands helpers
Make intent more explicit with the use of new helper methods.

Reviewed By: #bolt, maksfb

Differential Revision: https://reviews.llvm.org/D150810
2023-05-17 21:52:33 -07:00
spupyrev
3e3a926be8 [BOLT][NFC] Add hash computation for basic blocks
Extending yaml profile format with block hashes, which are used for stale
profile matching. To avoid duplication of the code, created a new class with a
collection of utilities for computing hashes.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D144306
2023-05-02 14:03:47 -07:00
Nathan Sidwell
f84ac48f1e [BOLT] Add BOLT_TARGETS_TO_BUILD
Adds BOLT_TARGETS_TO_BUILD, which defaults to the intersection of
X86;AArch64 and LLVM_TARGETS_TO_BUILD, but allows configuration to
alter that -- for instance omitting one of those two targets even if
llvm supports both.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D148847
2023-04-21 13:07:04 -04:00
Job Noorman
df3f1e2f31 [BOLT][NFC] Fix UB due to left shift of negative value
The following test fails when enabling UBSan due to a left shift of a
negative value:

> runtime error: left shift of negative value -2

  BOLT :: AArch64/ext-island-ref.s

This patch fixes this by using a multiplication instead of a shift.

Reviewed By: yota9

Differential Revision: https://reviews.llvm.org/D148218
2023-04-13 14:29:19 +02:00
Amir Ayupov
edda85771a [BOLT][NFC] Move addRelocation{X86,AArch64} into MCPlusBuilder
The two methods don't belong in BinaryFunction methods.
Move the dispatch tables into target-specific MCPlusBuilder methods.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D131813
2023-03-14 17:34:25 -07:00
Amir Ayupov
4e99891e70 [BOLT][NFC] Provide default impl for MIB methods that are only overridden on X86
Simplifies D145687

Reviewed By: #bolt, rafauler

Differential Revision: https://reviews.llvm.org/D145972
2023-03-14 17:19:24 -07:00
Amir Ayupov
223ec28da4 [BOLT][NFC] Return instruction list from createInstrIncMemory
Leverage move semantics for `std::vector`.

This also makes it consistent with `createInstrumentationSnippet`.

Reviewed By: Elvina

Differential Revision: https://reviews.llvm.org/D145465
2023-03-13 12:56:39 -07:00
Maksim Panchenko
fb28196a64 [BOLT] Fix intermittent crash with instrumentation
When createInstrumentedIndirectCall() was invoked for tail calls, we
attached annotation instruction twice to the new call instruction.
First in createDirectCall(), and then again while copying over the
metadata operands.

As a result, the annotations were not properly stripped for such calls
before the call to freeAnnotations() in LowerAnnotations pass. That lead
to use-after-free while restoring the offsets with setOffset() call.

Reviewed By: yota9

Differential Revision: https://reviews.llvm.org/D144806
2023-02-27 14:11:10 -08:00
Shengchen Kan
471c0e000a [BOLT][X86][NFC] Simplify the code of X86MCPlusBuilder::getAliasSized
Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D144551
2023-02-23 10:41:28 +08:00