Commit Graph

76 Commits

Author SHA1 Message Date
spupyrev
b77172ce2f updating cache metrics
Summary:
This is a replacement of a previous diff. The implemented metric
('graph distance') is not very useful at the moment but I plan to add
more relevant metrics in the subsequent diff. This diff fixes some
obvious problems and moves the call of CalcMetrics::printAll to the
right place.

(cherry picked from FBD6072312)
2017-10-16 16:53:50 -07:00
Rafael Auler
42f957bb75 [BOLT] Integrate perf2bolt into llvm-bolt
Summary:
Move the data aggregator logic from our python script to
our C++ LLVM/BOLT libs. This has a dramatic reduction in processing
time for profiling data (from 45 minutes for HHVM to 5 minutes) because
we directly use BOLT as a disassembler in order to validate traces found
in the LBR and to add the fallthrough counts. Previously, the python
approach relied on parsing the output objdump to check traces.

(cherry picked from FBD5761313)
2017-09-01 18:13:51 -07:00
Bohan Ren
ec304396c3 [BOLT] Call Distance Metric
Summary:
Designed a new metric, which shows 93.46% correltation with Cache Miss
and 86% correlation with CPU Time.

Definition:

One can get all the traversal path for each function. And for each traversal,
we will define a distance. The distance represents how far two connected
basic blocks are. Therefore, for each traversal, I will go through the
basic blocks one by one, until the end of the traversal and sum up the
distance for the neighboring basic blocks.
Distance between two connected basic blocks is the distance of the
centers of two blocks in the binary file.

(cherry picked from FBD5242526)
2017-06-13 16:29:39 -07:00
Bill Nell
5cd58961a9 Add .bolt_info notes section containing BOLT revision and command line args.
Summary:
Optinally add a .bolt_info notes section containing BOLT revision and command line args.
The new section is controlled by the -add-bolt-info flag which is on by default.

(cherry picked from FBD5125890)
2017-05-24 14:14:16 -07:00
Maksim Panchenko
f241e252fc [BOLT] Detect and handle __builtin_unreachable().
Summary:
Calls to __builtin_unreachable() can result in a inconsistent CFG.
It was possible for basic block to end with a conditional branche
and have a single successor. Or there could exist non-terminated
basic block without successors.

We also often treated conditional jumps with destination past the end
of a function as conditional tail calls. This can be prevented
reliably at least when the byte past the end of the function does
not belong to the next function.

This diff includes several changes:
  * At disassembly stage jumps past the end of a function are converted
    into 'nops'. This is done only for cases when we can guarantee that
    the jump is not a tail call. Conversion to nop is required since the
    instruction could be referenced either by exception handling
    tables and/or debug info. Nops are later removed.
  * In CFG insert 'ret' into non-terminated basic blocks without
    successors (this almost never happens).
  * Conditional jumps at the end of the function are removed from
    CFG. The block will still have a single successor.
  * Cases where a destination of a jump instruction is the start
    of the next function, are still conservatively handled as
    (conditional) tail calls.

(cherry picked from FBD4655046)
2017-03-03 11:35:41 -08:00
Maksim Panchenko
88244a10bb [BOLT] Move BOLT passes under Passes subdirectory (NFC).
Summary:
Move passes under Passes subdirectory.

Move inlining passes under Passes/Inliner.*

(cherry picked from FBD4575832)
2017-02-16 14:57:57 -08:00
Rafael Auler
a75bbfc640 Add a frame optimization pass
Summary:
This is a first attempt to perform data flow analyses on bolt
and try to rebuild the stack frame for functions. The goal of the frame
optimization pass is to detect instructions that are accessing stack and,
if loading values, evaluate whether this load is redundant and we can
substitute the memory operation for a register load or immediate load.
To find opportunities, this pass also builds a map of clobbered registers
by function, so we use this in our analysis at call sites. If a call site
is found out to not clobber a caller-saved register but the caller is
spilling it anyway to the stack (to comply with the ABI), we should
detect these cases and remove this unnecessary move.

(cherry picked from FBD4337238)
2016-12-05 11:47:08 -08:00
Maksim Panchenko
f2d82919d0 Move debug-handling code into DWARFRewriter (NFC).
Summary: RewriteInstance.cpp is getting too big. Split the code.

(cherry picked from FBD3596103)
2016-05-31 19:12:26 -07:00
Theodoros Kasampalis
d09b00ebff Refactoring of the reordering algorithms
Summary:
The various reorder and clustering algorithms have been refactored
into separate classes, so that it is easier to add new algorithms and/or
change the logic of algorithm selection.

(cherry picked from FBD3473656)
2016-06-16 18:47:57 -07:00
Gabriel Poesia
e6acc7bb53 Optimize calls to functions that are a single unconditional jump
Summary:
Many functions (around 600) in the HHVM binary are simply
a single unconditional jump instruction to another function. These can
be trivially optimized by modifying the call sites to directly call the
branch target instead (because it also happens with more than one jump
in sequence, we do it iteratively).

This diff also adds a very simple analysis/optimization pass system in
which this pass is the first one to be implemented. A follow-up to this
could be to move the current optimizations to other passes.

(cherry picked from FBD3211138)
2016-04-15 15:59:52 -07:00
Maksim Panchenko
ff68b34553 Tool to merge .fdata files.
Summary:
merge-fdata tool takes multiple .fdata files and outputs to stdout
combined fdata. Takes about 2 seconds per each additional .fdata
file with hhvm production data.

(cherry picked from FBD3216430)
2016-04-08 12:18:06 -07:00
Gabriel Poesia
ad344c4387 Group debugging info representation and serialization code.
Summary:
Moved the classes related to representing and serializing DWARF entities into a single
header, DebugData.h.

(cherry picked from FBD3153279)
2016-04-07 15:06:43 -07:00
Gabriel Poesia
4b4db40174 Update DWARF location lists after optimization.
Summary:

    Summary: Update DWARF location lists in .debug_loc and pointers to
    them in .debug_info so that gdb can print variables which change
    location during their lifetime.

    The following changes were made:

    - Refactored BasicBlockOffsetRanges to allow ranges to be tied to binary information (so that we can reuse it for location lists)
    - Implemented range compression optimization in BasicBlockOffsetRanges (needed otherwise too much data was being generated).
    - Added representation for location lists (LocationList.h, BinaryContext.h)
    - Implemented .debug_loc serializer that keeps the updated offsets (DebugLocWriter.{h,cpp})
    - After disassembly, traverse entries in .debug_loc and save them in context (BinaryContext.cpp)
    - After optimizations, serialize .debug_loc and update pointers in .debug_info (RewriteInstance.cpp)

(cherry picked from FBD3130682)
2016-04-01 11:37:28 -07:00
Gabriel Poesia
ffa9641e16 Update DWARF lexical blocks address ranges.
Summary:
Updates DWARF lexical blocks address ranges in the output binary after optimizations.
This is similar to updating function address ranges except that the ranges representation needs
to be more general, since address ranges can begin or end in the middle of a basic block.

The following changes were made:

- Added a data structure for iterating over the basic blocks that intersect an address range: BasicBlockTable.h
- Added some more bookkeeping in BinaryBasicBlock. Basically, I needed to keep track of the block's size in the input binary as well as its address in the output binary. This information is mostly set by BinaryFunction after disassembly.
- Added a representation for address ranges relative to basic blocks (BasicBlockOffsetRanges.h). Will also serve for location lists.
- Added a representation for Lexical Blocks (LexicalBlock.h)
- Small refactorings in DebugArangesWriter:
-- Renamed to DebugRangesSectionsWriter since it also writes .debug_ranges
-- Refactored it not to depend on BinaryFunction but instead on anything that can be assined an aoffset in .debug_ranges (added an interface for that)
- Iterate over the DIE tree during initialization to find lexical blocks in .debug_info (BinaryContext.cpp)
- Added patches to .debug_abbrev and .debug_info in RewriteInstance to update lexical blocks attributes (in fact, this part is very similar to what was done to function address ranges and I just refactored/reused that code)
- Added small test case (lexical_blocks_address_ranges_debug.test)

(cherry picked from FBD3113181)
2016-03-28 17:45:22 -07:00
Gabriel Poesia
466cbae866 Update subroutine address ranges in binary.
Summary:
[WIP] Update DWARF info for function address ranges.

This diff currently does not work for unknown reasons,
but I'm describing here what's the current state.
According to both llvm-dwarf and readelf our output seems correct,
but GDB does not interpret it as expected. All details go below in
hope I missed something.

I couldn't actually track the whole change that introduced support for
what we need in gdb yet, but I think I can get to it
(2007-12-04: Support
lexical bocks and function bodies that occupy non-contiguous address ranges). I have reasons to believe gdb at least at some
nges).

The set of introduced changes was basically this:

- After disassembly, iterate over the DIEs in .debug_info and find the
ones that correspond to each BinaryFunction.
- Refactor DebugArangesWriter to also write addresses of functions to
.debug_ranges and track the offsets of function address ranges there
- Add some infrastructure to facilitate patching the binary in
simple ways (BinaryPatcher.h)
- In RewriteInstance, after writing .debug_ranges already with
function address ranges, for each function do:
-- Find the abbreviation corresponding to the function
-- Patch .debug_abbrev to replace DW_AT_low_pc with DW_AT_ranges and
DW_AT_high_pc with DW_AT_producer (I'll explain this hack below).
Also patch the corresponding forms to DW_FORM_sec_offset and
DW_FORM_string (null-terminated in-place string).
-- Patch debug_info with the .debug_ranges offset in place of
the first 4 bytes of DW_AT_low_pc (DW_AT_ranges only occupies 4
bytes whereas low_pc occupies 8), and write an arbitrary string
in-place in the other 12 bytes that were the 4 MSB of low_pc
and the 8 bytes of high_pc before the patch. This depends on
low_pc and high_pc being put consecutively by the compiler, but
it serves to validate the idea. I tried another way of doing it
that does not rely on this but it didn't work either and I believe
the reason for either not working is the same (and still unknown,
but unrelated to them. I might be wrong though, and if I find yet
another way of doing it I may try it). The other way was to
use a form of DW_FORM_data8 for the section offset. This is
disallowed by the specification, but I doubt gdb validates this,
as it's just easier to store it as 64-bit anyway as this is even
necessary to support 64-bit DWARF (which is not what gcc generates
by default apparently).

I still need to make changes to the diff to make it production-ready,
but first I want to figure out why it doesn't work as expected.

By looking at the output of llvm-dwarfdump or readelf, all of
.debug_ranges, .debug_abbrev and .debug_info seem to have been
correctly updated. However, gdb seems to have serious problems with
what we write.

(In fact, readelf --debug-dump=Ranges shows some funny warning messages
of the form ("Warning: There is a hole [0x100 - 0x120] in .debug_ranges"),
but I played around with this and it seems it's just because no
compile unit was using these ranges. Changing .debug_info apparently
changes these warnings, so they seem to be unrelated to the section
itself. Also looking at the hex dump of the section doesn't help,
as everything seems fine. llvm-dwarfdump doesn't say anything.
So I think .debug_ranges is fine.)

The result is that gdb not only doesn't show the function name as we
wanted, but it also stops showing line number information.
Apparently it's not reading/interpreting the address ranges at all,
and so the functions now have no associated address ranges, only the
symbol value which allows one to put a breakpoint in the function,
but not to show source code.

As this left me without more ideas of what to try to feed gdb with,
I believe the most promising next trial is to try to debug gdb itself,
unless someone spots anything I missed.
I found where the interesting part of the code lies for this
case (gdb/dwarf2read.c and some other related files, but mainly that one).
It seems in some parts gdb uses DW_AT_ranges for only getting
its lowest and highest addresses and setting that as low_pc and
high_pc (see dwarf2_get_pc_bounds in gdb's code and where it's called).
I really hope this is not actually the case for
function address ranges. I'll investigate this further. Otherwise
I don't think any changes we make will make it work as initially
intended, as we'll simply need gdb to support it and in that case it
doesn't.

(cherry picked from FBD3073641)
2016-03-16 18:08:29 -07:00
Gabriel Poesia
80ea31b24e Write updated .debug_aranges section after optimizations.
Summary:
Write the .debug_aranges section after optimizations to the output binary.
Each function generates at least one range and at most two (one extra for its cold part).
The writing is done manually because LLVM's implementation is tied to the output of
.debug_info (see EmitGenDwarfInfo and EmitGenDwarfARanges in lib/MC/MCDwarf.cpp),
which we don't want to trigger right now.

(cherry picked from FBD3043108)
2016-03-11 11:30:30 -08:00
Gabriel Poesia
77a6b72842 BOLT: Read and tie .debug_line info to IR.
Summary:
Reads information in the DWARF .debug_line section using LLVM and
tie every MCInst to one line of a line table from the input binary. Subsequent
diffs will update this information to match the final binary layout and
output updated line tables.

(cherry picked from FBD2989813)
2016-02-25 16:57:07 -08:00
Maksim Panchenko
d1526083fc Rename binary optimizer to BOLT.
Summary:
BOLT - Binary Optimization and Layout Tool replaces FLO.
I'm keeping .fdata extension for "feedback data".

(cherry picked from FBD2908028)
2016-02-05 14:42:04 -08:00
Rafael Auler
c67a753e3c Refactoring llvm-flo.cpp into a new class RewriteInstance, NFC.
Summary:
Previously, llvm-flo.cpp contained a long function doing lots of
different tasks. This patch refactors this logic into a separate class with
different member functions, exposing the relationship between each step of
the rewritting process and making it easier to coordinate/change it.

(cherry picked from FBD2691674)
2015-11-23 17:54:18 -08:00
Rafael Auler
2088875656 Teach llvm-flo how to read .eh_frame information from binaries
Summary: In order to reorder binaries with C++ exceptions, we first need to
read DWARF CFI (call frame info) from binaries in a table in the .eh_frame
ELF section. This table contains unwinding information we need to be aware of
when reordering basic blocks, so as to avoid corrupting it. This patch also
cleans up some code from Exceptions.cpp due to a refactoring where we moved
some functions to the LLVM's libSupport.

(cherry picked from FBD2614464)
2015-11-05 13:37:30 -08:00
Maksim Panchenko
21cc191ea8 Added function to parse and dump .gcc_except_table
Summary: Use '-print-exceptions' option to dump contents of .gcc_except_table.

(cherry picked from FBD2609925)
2015-11-02 11:50:53 -07:00
Maksim Panchenko
b4ed5cc942 Make FLO work on hhvm binary.
Summary: Fixes several issues that prevented us from running hhvm binary.

(cherry picked from FBD2543057)
2015-10-14 15:35:14 -07:00
Rafael Auler
e1a539b0ec Add initial implementation of DataReader
Summary:
This patch introduces DataReader, a module responsible for
parsing llvm flo data files into in-memory data structures.

(cherry picked from FBD2515754)
2015-10-05 18:31:25 -07:00
Maksim Panchenko
9a2fe7ebe4 Commit FLO with control flow graph.
Summary:
llvm-flo disassembles, builds control flow graph, and re-writes
simple functions.

(cherry picked from FBD2524024)
2015-10-09 17:21:14 -07:00
Maksim Panchenko
7927c14ff5 Fixed cmake.
(cherry picked from FBD28108725)
2015-10-02 12:38:07 -07:00
Maksim Panchenko
575b24d719 Initial FLO commit.
Summary:
Directory created.

(cherry picked from FBD28105260)
2015-10-02 11:55:15 -07:00