clang-p2996

Author	SHA1	Message	Date
Maksim Panchenko	6b1cf00400	[BOLT] Add support for Linux kernel static keys jump table (#86090 ) Runtime code modification used by static keys is the most ubiquitous self-modifying feature of the Linux kernel. The idea is to to eliminate the condition check and associated conditional jump on a hot path if that condition (based on a boolean value of a static key) does not change often. Whenever they condition changes, the kernel runtime modifies all code paths associated with that key flipping the code between nop and (unconditional) jump.	2024-03-21 14:05:21 -07:00
Maksim Panchenko	bba790db47	[BOLT] Refactor instruction creation interface. NFCI (#85292 ) Refactor MCPlusBuilder's create{Instruction}() functions that used to return bool. We almost never check the return value as we rely on llvm_unreachable() to detect unimplemented functionality. There were a couple of cases that checked the return value, but they would hit the unreachable condition first (at least in debug builds) before the return value gets checked.	2024-03-14 13:17:17 -07:00
Amir Ayupov	52cf07116b	[BOLT][NFC] Log through JournalingStreams (#81524 ) Make core BOLT functionality more friendly to being used as a library instead of in our standalone driver llvm-bolt. To accomplish this, we augment BinaryContext with journaling streams that are to be used by most BOLT code whenever something needs to be logged to the screen. Users of the library can decide if logs should be printed to a file, no file or to the screen, as before. To illustrate this, this patch adds a new option `--log-file` that allows the user to redirect BOLT logging to a file on disk or completely hide it by using `--log-file=/dev/null`. Future BOLT code should now use `BinaryContext::outs()` for printing important messages instead of `llvm::outs()`. A new test log.test enforces this by verifying that no strings are print to screen once the `--log-file` option is used. In previous patches we also added a new BOLTError class to report common and fatal errors, so code shouldn't call exit(1) now. To easily handle problems as before (by quitting with exit(1)), callers can now use `BinaryContext::logBOLTErrorsAndQuitOnFatal(Error)` whenever code needs to deal with BOLT errors. To test this, we have fatal.s that checks we are correctly quitting and printing a fatal error to the screen. Because this is a significant change by itself, not all code was yet ported. Code from Profiler libs (DataAggregator and friends) still print errors directly to screen. Co-authored-by: Rafael Auler <rafaelauler@fb.com> Test Plan: NFC	2024-02-12 14:53:53 -08:00
Amir Ayupov	13d60ce2f2	[BOLT][NFC] Propagate BOLTErrors from Core, RewriteInstance, and passes (2/2) (#81523 ) As part of the effort to refactor old error handling code that would directly call exit(1), in this patch continue the migration on libCore, libRewrite and libPasses to use the new BOLTError class whenever a failure occurs. Test Plan: NFC Co-authored-by: Rafael Auler <rafaelauler@fb.com>	2024-02-12 14:51:15 -08:00
Amir Ayupov	fa7dd4919a	[BOLT][NFC] Add BOLTError and return it from passes (1/2) (#81522 ) As part of the effort to refactor old error handling code that would directly call exit(1), in this patch we add a new class BOLTError and auxiliary functions `createFatalBOLTError()` and `createNonFatalBOLTError()` that allow BOLT code to bubble up the problem to the caller by using the Error class as a return type (or Expected). Also changes passes to use these. Co-authored-by: Rafael Auler <rafaelauler@fb.com> Test Plan: NFC	2024-02-12 14:39:59 -08:00
Amir Ayupov	a5f3d1a803	[BOLT][NFC] Return Error from BinaryFunctionPass::runOnFunctions (#81521 ) As part of the effort to refactor old error handling code that would directly call exit(1), in this patch we change the interface to `BinaryFunctionPass` to return an Error on `runOnFunctions()`. This gives passes the ability to report a serious problem to the caller (RewriteInstance class), so the caller may decide how to best handle the exceptional situation. Co-authored-by: Rafael Auler <rafaelauler@fb.com> Test Plan: NFC	2024-02-12 14:36:12 -08:00
Maksim Panchenko	7fe97f0420	[BOLT] Always run CheckLargeFunctions in non-relocation mode (#80922 ) We run CheckLargeFunctions pass in non-relocation mode to prevent the emission of functions that later could not be written to the output due to their large size. The main reason behind the pass is to prevent the emission of metadata for such functions since this metadata becomes incorrect if the function is left unmodified. Currently, the pass is enabled in non-relocation mode only when debug info output is also enabled. As we emit increasingly more kinds of metadata, e.g. for the Linux kernel, it becomes more challenging to track metadata that needs to be fixed. Hence, I'm enabling the pass to always run in non-relocation mode.	2024-02-08 14:21:49 -08:00
Maksim Panchenko	8ea7f1d20a	[BOLT][NFCI] Keep instruction annotations (#80382 ) We used to delete most instruction annotations before code emission. It was done to release memory taken by annotations and to reduce overall memory consumption. However, since the implementation of annotations has moved to using existing instruction operands, the memory overhead associated with them has reduced drastically. I measured that savings are less than 0.5% on large binaries and processing time is just slightly reduced if we keep them. Additionally, I plan to use annotations in pre-emission passes for the Linux kernel rewriter.	2024-02-06 19:59:53 -08:00
Amir Ayupov	3c64b24ed3	[BOLT] Add extra staleness logging (#80225 ) Report two extra metrics: - # of stale functions with matching block count, - # of stale blocks with matching instruction count.	2024-02-01 07:16:40 -08:00
Amir Ayupov	e9309b27d7	[BOLT] Report input staleness (#79496 ) It's beneficial to have uniform reporting in both `infer-stale-profile` on and off cases, primarily for logging purposes. Without this change, BOLT would report "input" staleness in `infer-stale-profile=0` case (without matching), and "output" staleness in `infer-stale-profile=1` case (after matching). This change makes BOLT report "input" staleness in both cases. "Output" staleness information is printed separately with "BOLT-INFO: inferred profile..."	2024-01-25 14:15:13 -08:00
Maksim Panchenko	f633f325a1	[BOLT] Fix NOP instruction emission on x86 (#72186 ) Use MCAsmBackend::writeNopData() interface to emit NOP instructions on x86. There are multiple forms of NOP instruction on x86 with different sizes. Currently, LLVM's assembly/disassembly does not support all forms correctly which can lead to a breakage of input code semantics, e.g. if the program relies on NOP instructions for reserving a patch space. Add "--keep-nops" option to preserve NOP instructions.	2023-11-13 18:12:39 -08:00
Maksim Panchenko	ec4a03c658	[BOLT] Enhance LowerAnnotations pass. NFCI. (#71847 ) After #70147, all primary annotation types are stored directly in the instruction and hence there's no need for the temporary storage we've used previously for repopulating preserved annotations.	2023-11-12 19:34:42 -08:00
Vladislav Khmelevsky	c6c04a83a7	[BOLT] Run EliminateUnreachableBlocks in parallel (#71299 ) The wall time for this pass decreased on my laptop from ~80 sec to 5 sec processing the clang.	2023-11-10 00:46:04 +04:00
maksfb	e28c393bd1	[BOLT] Reduce the number of emitted symbols. NFCI. (#70175 ) We emit a symbol before an instruction for a number of reasons, e.g. for tracking LocSyms, debug line, or if the instruction has a label annotation. Currently, we may emit multiple symbols per instruction. Reuse the same label instead of creating and emitting new ones when possible. I'm planning to refactor EH labels as well in a separate diff. Change getLabel() to return a pointer instead of std::optional<> since an empty label should be treated identically to no label.	2023-11-06 11:41:47 -08:00
Job Noorman	43e9eae6e8	[BOLT] Preserve label annotations for injected functions (#68713 ) Needed for instrumentation on RISC-V.	2023-10-11 07:26:20 +00:00
Job Noorman	ff5e2babcb	[BOLT] Improve handling of relocations targeting specific instructions (#66395 ) On RISC-V, there are certain relocations that target a specific instruction instead of a more abstract location like a function or basic block. Take the following example that loads a value from symbol `foo`: ``` nop 1: auipc t0, %pcrel_hi(foo) ld t0, %pcrel_lo(1b)(t0) ``` This results in two relocation: - auipc: `R_RISCV_PCREL_HI20` referencing `foo`; - ld: `R_RISCV_PCREL_LO12_I` referencing to local label `1` which points to the auipc instruction. It is of utmost importance that the `R_RISCV_PCREL_LO12_I` keeps referring to the auipc instruction; if not, the program will fail to assemble. However, BOLT currently does not guarantee this. BOLT currently assumes that all local symbols are jump targets and always starts a new basic block at symbol locations. The example above results in a CFG the looks like this: ``` .BB0: nop .BB1: auipc t0, %pcrel_hi(foo) ld t0, %pcrel_lo(.BB1)(t0) ``` While this currently works (i.e., the `R_RISCV_PCREL_LO12_I` relocation points to the correct instruction), it has two downsides: - Too many basic blocks are created (the example above is logically only one yet two are created); - If instructions are inserted in `.BB1` (e.g., by instrumentation), things will break since the label will not point to the auipc anymore. This patch proposes to fix this issue by teaching BOLT to track labels that should always point to a specific instruction. This is implemented as follows: - Add a new annotation type (`kLabel`) that allows us to annotate instructions with an `MCSymbol *`; - Whenever we encounter a relocation type that is used to refer to a specific instruction (`Relocation::isInstructionReference`), we register it without a symbol; - During disassembly, whenever we encounter an instruction with such a relocation, create a symbol for its target and store it in an offset to symbol map (to ensure multiple relocations referencing the same instruction use the same label); - After disassembly, iterate this map to attach labels to instructions via the new annotation type; - During emission, emit these labels right before the instruction. I believe the use of annotations works quite well for this use case as it allows us to reliably track instruction labels. If we were to store them as offsets in basic blocks, it would be error prone to keep them updated whenever instructions are inserted or removed. I have chosen to add labels as first-class annotations (as opposed to a generic one) because the documentation of `MCAnnotation` suggests that generic annotations are to be used for optional metadata that can be discarded without affecting correctness. As this is not the case for labels, a first-class annotation seemed more appropriate.	2023-10-06 06:46:16 +00:00
spupyrev	31e8a9f4d9	[BOLT] Add stale-related logging Adding some logs related to stale profile matching. The new data can be helpful to understand how "stale" the input profile is and how well the inference is able to utilize the stale data. Example of outputs on clang-10 built with LTO (profile collected on a year-old release): ``` BOLT-INFO: inferred profile for 2101 (18.52% of profiled, 100.00% of stale) functions responsible for 30.95% samples (14754697 out of 47670654) BOLT-INFO: stale inference matched 89.42% of basic blocks (79052 out of 88402 stale) responsible for 76.99% samples (645737 out of 838719 stale) ``` LTO+AutoFDO: ``` BOLT-INFO: inferred profile for 6146 (57.57% of profiled, 100.00% of stale) functions responsible for 90.34% samples (50891403 out of 56330313) BOLT-INFO: stale inference matched 74.55% of basic blocks (191295 out of 256589 stale) responsible for 57.30% samples (1288632 out of 2248799 stale) ``` Reviewed By: Amir, maksfb Differential Revision: https://reviews.llvm.org/D154737	2023-07-27 08:56:57 -07:00
Maksim Panchenko	deb53102a7	[BOLT] Remove unnecessary diagnostics When optimizations passes do not change anything, skip their diagnostics output. NFC otherwise. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D153386	2023-06-22 14:07:00 -07:00
spupyrev	44268271f6	[BOLT] stale profile matching [part 1 out of 2] BOLT often has to deal with profiles collected on binaries built from several revisions behind release. As a result, a certain percentage of functions is considered stale and not optimized. This diff adds an ability to match profile to functions that are not 100% binary identical, which increases the optimization coverage and boosts the performance of applications. The algorithm consists of two phases: matching and inference: - At the matching phase, we try to "guess" as many block and jump counts from the stale profile as possible. To this end, the content of each basic block is hashed and stored in the (yaml) profile. When BOLT optimizes a binary, it computes block hashes and identifies the corresponding entries in the stale profile. It yields a partial profile for every CFG in the binary. - At the inference phase, we employ a network flow-based algorithm (profi) to reconstruct "realistic" block and jump counts from the partial profile generated at the first stage. In practice, we don't always produce proper profile data but the majority (e.g., >90%) of CFGs get the correct counts. This is a first part of the change; the next stacked diff extends the block hashing and provides perf evaluation numbers. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D144500	2023-06-06 12:13:52 -07:00
Rafael Auler	62a2feff57	[BOLT] Fix state of MCSymbols in lowering pass We have mostly harmless data races when running BinaryContext::calculateEmittedSize() in parallel, while performing split function pass. However, it is possible to end up in a state where some MCSymbols are still registered and our clean up failed. This happens rarely but it does happen, and when it happens, it is a difficult to diagnose heisenbug. To avoid this, add a new clean pass to perform a last check on MCSymbols, before they undergo our final emission pass, to verify that they are in a sane state. If we fail to do this, we might resolve some symbols to zero and crash the output binary. Reviewed By: #bolt, Amir Differential Revision: https://reviews.llvm.org/D137984	2023-05-16 14:54:16 -07:00
Amir Ayupov	be2f67c4d8	[BOLT][NFC] Replace anonymous namespace functions with static Follow LLVM Coding Standards guideline on using anonymous namespaces (https://llvm.org/docs/CodingStandards.html#anonymous-namespaces) and use `static` modifier for function definitions. Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D143124	2023-02-06 18:05:41 -08:00
Amir Ayupov	69a9bbf106	[BOLT][NFC] Replace ambiguous BinarySection::isReadOnly with isWritable Address feedback in https://reviews.llvm.org/D102284#2755060 Reviewed By: yota9 Differential Revision: https://reviews.llvm.org/D141733	2023-01-18 14:53:07 -08:00
Maksim Panchenko	27cf96c4ec	[BOLT] Minor refactoring for -print-sorted-by option Only display used values for -print-sorted-by option when printing help. Differential Revision: https://reviews.llvm.org/D141209	2023-01-12 13:25:36 -08:00
hezuoqiang	e3b47d31ae	[BOLT] Modify the print option to a meaningful value Using the option `-print-sorted-by=.` cause to core dump, so change to a legal value. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D140847	2023-01-09 19:05:21 -08:00
Gabriel Ravier	9966b3e728	[BOLT] Fixed some typos I went over the output of the following mess of a command: `(ulimit -m 2000000; ulimit -v 2000000; git ls-files -z \| parallel --xargs -0 cat \| aspell list --mode=none --ignore-case \| grep -E '^[A-Za-z][a-z]*$' \| sort \| uniq -c \| sort -n \| grep -vE '.{25}' \| aspell pipe -W3 \| grep : \| cut -d' ' -f2 \| less)` and proceeded to spend a few days looking at it to find probable typos and fixed a few hundred of them in all of the llvm project (note, the ones I found are not anywhere near all of them, but it seems like a good start). Reviewed By: Amir, maksfb Differential Revision: https://reviews.llvm.org/D130824	2022-09-30 17:07:04 +02:00
Amir Ayupov	90d87dbf4b	[BOLT] Report BB reordering %-age vs profiled and total number of functions Reviewed By: spupyrev Differential Revision: https://reviews.llvm.org/D134819	2022-09-29 12:35:45 +02:00
Amir Ayupov	873942e178	[BOLT] Change reorder-blocks deprecated option warning output Revert to using `BOLT-WARNING` Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D132778	2022-09-08 15:48:41 -07:00
Fabian Parzefall	07f63b0ac5	[BOLT] Allocate FunctionFragment on heap This changes `FunctionFragment` from being used as a temporary proxy object to access basic block ranges to a heap-allocated object that can store fragment-specific information. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D132050	2022-08-24 18:06:08 -07:00
Fabian Parzefall	d5c03def24	[BOLT] Towards FunctionLayout const-correctness A const-qualified reference to function layout allows accessing non-const qualified basic blocks on a const-qualified function. This patch adds or removes const-qualifiers where necessary to indicate where basic blocks are used in a non-const manner. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D132049	2022-08-24 16:32:33 -07:00
Fabian Parzefall	f24c299e7d	Revert "[BOLT] Towards FunctionLayout const-correctness" This reverts commit `587d265342`.	2022-08-24 10:51:38 -07:00
Fabian Parzefall	5065134aa0	Revert "[BOLT] Allocate FunctionFragment on heap" This reverts commit `101344af1a`.	2022-08-24 10:51:36 -07:00
Fabian Parzefall	101344af1a	[BOLT] Allocate FunctionFragment on heap This changes `FunctionFragment` from being used as a temporary proxy object to access basic block ranges to a heap-allocated object that can store fragment-specific information. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D132050	2022-08-24 10:17:17 -07:00
Fabian Parzefall	587d265342	[BOLT] Towards FunctionLayout const-correctness A const-qualified reference to function layout allows accessing non-const qualified basic blocks on a const-qualified function. This patch adds or removes const-qualifiers where necessary to indicate where basic blocks are used in a non-const manner. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D132049	2022-08-24 10:17:17 -07:00
Fabian Parzefall	a191ea7d59	[BOLT] Make exception handling fragment aware This adds basic fragment awareness in the exception handling passes and generates the necessary symbols for fragments. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D130520	2022-08-18 21:55:06 -07:00
Fabian Parzefall	aed75748de	[BOLT] Remove old layout from function layout To track whether a function's new layout is different from its old layout when updating it, the old layout would be kept around in memory indefinitely (if the new layout is different). This was used only for debugging/logging purposes. This patch forces the caller of function layout's update method to copy the old layout into a temporary if they need it by removing the old layout fields. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D131413	2022-08-17 15:06:17 -07:00
Fabian Parzefall	8477bc6761	[BOLT] Add function layout class This patch adds a dedicated class to keep track of each function's layout. It also lays the groundwork for splitting functions into multiple fragments (as opposed to a strict hot/cold split). Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D129518	2022-07-16 17:23:24 -07:00
Fabian Parzefall	d55dfeaf32	[BOLT] Replace uses of layout with basic block list As we are moving towards support for multiple fragments, loops that iterate over all basic blocks of a function, but do not depend on the order of basic blocks in the final layout, should iterate over binary functions directly, rather than the layout. Eventually, all loops using the layout list should either iterate over the function, or be aware of multiple layouts. This patch replaces references to binary function's block layout with the binary function itself where only little code changes are necessary. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D129585	2022-07-14 13:07:05 -07:00
Vladislav Khmelevsky	35efe1d806	[BOLT][AArch64] Handle gold linker veneers The gold linker veneers are written between functions without symbols, so we to handle it specially in BOLT. Vladislav Khmelevsky, Advanced Software Technology Lab, Huawei Differential Revision: https://reviews.llvm.org/D129260	2022-07-13 14:47:22 +03:00
Rafael Auler	fc2d96c334	Revert "[BOLT][AArch64] Handle gold linker veneers" This reverts commit `425dda76e9`. This commit is currently causing BOLT to crash in one of our binaries and needs a bit more checking to make sure it is safe to land.	2022-06-28 19:23:28 -07:00
Vladislav Khmelevsky	425dda76e9	[BOLT][AArch64] Handle gold linker veneers The gold linker veneers are written between functions without symbols, so we to handle it specially in BOLT. Vladislav Khmelevsky, Advanced Software Technology Lab, Huawei Differential Revision: https://reviews.llvm.org/D128082	2022-06-28 16:14:05 +03:00
Amir Ayupov	d2c8769936	[BOLT][NFC] Use range-based STL wrappers Replace `std::` algorithms taking begin/end iterators with `llvm::` counterparts accepting ranges. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D128154	2022-06-23 22:16:27 -07:00
Maksim Panchenko	8228c70358	[BOLT][NFCI] Refactor interface for adding basic blocks Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D127935	2022-06-16 11:51:57 -07:00
Amir Ayupov	a2c4d6d332	[BOLT][NFC] Forward declare ReorderBlocks for MSVC19 Fix bolt-x86_64-wine-msvc builder: https://lab.llvm.org/buildbot/#/builders/222/builds/1154 Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D127612	2022-06-13 10:26:58 -07:00
Vladislav Khmelevsky	fd9604952d	[BOLT] Set valid index for functions with profiles Some of the passes that calculates tentative layout like LongJmp and Golang are expecting that only functions with valid index will be located in hot text section. But currently functions with valid profiles and not set index are breaking this logic, to fix this we can move the hasValidProfile() condition from AssignSections pass to ReorderFunctions. Vladislav Khmelevsky, Advanced Software Technology Lab, Huawei Differential Revision: https://reviews.llvm.org/D127223	2022-06-08 14:13:12 +03:00
Fangrui Song	b92436efcb	[bolt] Remove unneeded cl::ZeroOrMore for cl::opt options	2022-06-05 13:29:49 -07:00
Huan Nguyen	5ac26156fe	[BOLT][NFC] Warning for deprecated option '-reorder-blocks=cache+' Emit warning when using deprecated option '-reorder-blocks=cache+'. Auto switch to option '-reorder-blocks=ext-tsp'. Test Plan: ``` ninja check-bolt ``` Added a new test cache+-deprecated.test. Run and verify that the upstream tests are passed. Reviewed By: rafauler, Amir, maksfb Differential Revision: https://reviews.llvm.org/D126722	2022-06-03 14:16:55 -07:00
Amir Ayupov	bad3798113	[BOLT] Fix data race in shortenInstructions Address ThreadSanitizer warning Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D121338	2022-04-13 11:10:36 -07:00
Amir Ayupov	dc1cf838a5	[BOLT] Strip redundant AdSize override prefix Since LLVM MC now preserves redundant AdSize override prefix (0x67), remove it in BOLT explicitly (-x86-strip-redundant-adsize, on by default). Test Plan: `bin/llvm-lit -a bolt/test/X86/addr32.s` Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D120975	2022-03-16 09:38:17 -07:00
Vladislav Khmelevsky	19fb5a210d	[BOLT] Add aarch64 support for peephole passes Enable peephole optimizations for aarch64. Also small code refactoring - add PeepholeOpts under Peepholes class. Vladislav Khmelevsky, Advanced Software Technology Lab, Huawei Differential Revision: https://reviews.llvm.org/D118732	2022-02-08 03:04:40 +03:00
Amir Ayupov	a9cd49d50e	[BOLT][NFC] Move Offset annotation to Group 1 Summary: Move the annotation to avoid dynamic memory allocations. Improves the CPU time of instrumenting a large binary by 1% (+-0.8%, p-value 0.01) Test Plan: NFC Reviewers: maksfb FBD30091656	2022-01-18 13:24:50 -08:00

1 2

59 Commits