clang-p2996

Author	SHA1	Message	Date
spupyrev	b77172ce2f	updating cache metrics Summary: This is a replacement of a previous diff. The implemented metric ('graph distance') is not very useful at the moment but I plan to add more relevant metrics in the subsequent diff. This diff fixes some obvious problems and moves the call of CalcMetrics::printAll to the right place. (cherry picked from FBD6072312)	2017-10-16 16:53:50 -07:00
Rafael Auler	42f957bb75	[BOLT] Integrate perf2bolt into llvm-bolt Summary: Move the data aggregator logic from our python script to our C++ LLVM/BOLT libs. This has a dramatic reduction in processing time for profiling data (from 45 minutes for HHVM to 5 minutes) because we directly use BOLT as a disassembler in order to validate traces found in the LBR and to add the fallthrough counts. Previously, the python approach relied on parsing the output objdump to check traces. (cherry picked from FBD5761313)	2017-09-01 18:13:51 -07:00
Bohan Ren	ec304396c3	[BOLT] Call Distance Metric Summary: Designed a new metric, which shows 93.46% correltation with Cache Miss and 86% correlation with CPU Time. Definition: One can get all the traversal path for each function. And for each traversal, we will define a distance. The distance represents how far two connected basic blocks are. Therefore, for each traversal, I will go through the basic blocks one by one, until the end of the traversal and sum up the distance for the neighboring basic blocks. Distance between two connected basic blocks is the distance of the centers of two blocks in the binary file. (cherry picked from FBD5242526)	2017-06-13 16:29:39 -07:00
Bill Nell	5cd58961a9	Add .bolt_info notes section containing BOLT revision and command line args. Summary: Optinally add a .bolt_info notes section containing BOLT revision and command line args. The new section is controlled by the -add-bolt-info flag which is on by default. (cherry picked from FBD5125890)	2017-05-24 14:14:16 -07:00
Maksim Panchenko	f241e252fc	[BOLT] Detect and handle __builtin_unreachable(). Summary: Calls to __builtin_unreachable() can result in a inconsistent CFG. It was possible for basic block to end with a conditional branche and have a single successor. Or there could exist non-terminated basic block without successors. We also often treated conditional jumps with destination past the end of a function as conditional tail calls. This can be prevented reliably at least when the byte past the end of the function does not belong to the next function. This diff includes several changes: * At disassembly stage jumps past the end of a function are converted into 'nops'. This is done only for cases when we can guarantee that the jump is not a tail call. Conversion to nop is required since the instruction could be referenced either by exception handling tables and/or debug info. Nops are later removed. * In CFG insert 'ret' into non-terminated basic blocks without successors (this almost never happens). * Conditional jumps at the end of the function are removed from CFG. The block will still have a single successor. * Cases where a destination of a jump instruction is the start of the next function, are still conservatively handled as (conditional) tail calls. (cherry picked from FBD4655046)	2017-03-03 11:35:41 -08:00
Maksim Panchenko	88244a10bb	[BOLT] Move BOLT passes under Passes subdirectory (NFC). Summary: Move passes under Passes subdirectory. Move inlining passes under Passes/Inliner.* (cherry picked from FBD4575832)	2017-02-16 14:57:57 -08:00
Rafael Auler	a75bbfc640	Add a frame optimization pass Summary: This is a first attempt to perform data flow analyses on bolt and try to rebuild the stack frame for functions. The goal of the frame optimization pass is to detect instructions that are accessing stack and, if loading values, evaluate whether this load is redundant and we can substitute the memory operation for a register load or immediate load. To find opportunities, this pass also builds a map of clobbered registers by function, so we use this in our analysis at call sites. If a call site is found out to not clobber a caller-saved register but the caller is spilling it anyway to the stack (to comply with the ABI), we should detect these cases and remove this unnecessary move. (cherry picked from FBD4337238)	2016-12-05 11:47:08 -08:00
Maksim Panchenko	f2d82919d0	Move debug-handling code into DWARFRewriter (NFC). Summary: RewriteInstance.cpp is getting too big. Split the code. (cherry picked from FBD3596103)	2016-05-31 19:12:26 -07:00
Theodoros Kasampalis	d09b00ebff	Refactoring of the reordering algorithms Summary: The various reorder and clustering algorithms have been refactored into separate classes, so that it is easier to add new algorithms and/or change the logic of algorithm selection. (cherry picked from FBD3473656)	2016-06-16 18:47:57 -07:00
Gabriel Poesia	e6acc7bb53	Optimize calls to functions that are a single unconditional jump Summary: Many functions (around 600) in the HHVM binary are simply a single unconditional jump instruction to another function. These can be trivially optimized by modifying the call sites to directly call the branch target instead (because it also happens with more than one jump in sequence, we do it iteratively). This diff also adds a very simple analysis/optimization pass system in which this pass is the first one to be implemented. A follow-up to this could be to move the current optimizations to other passes. (cherry picked from FBD3211138)	2016-04-15 15:59:52 -07:00
Maksim Panchenko	ff68b34553	Tool to merge .fdata files. Summary: merge-fdata tool takes multiple .fdata files and outputs to stdout combined fdata. Takes about 2 seconds per each additional .fdata file with hhvm production data. (cherry picked from FBD3216430)	2016-04-08 12:18:06 -07:00
Gabriel Poesia	ad344c4387	Group debugging info representation and serialization code. Summary: Moved the classes related to representing and serializing DWARF entities into a single header, DebugData.h. (cherry picked from FBD3153279)	2016-04-07 15:06:43 -07:00
Gabriel Poesia	4b4db40174	Update DWARF location lists after optimization. Summary: Summary: Update DWARF location lists in .debug_loc and pointers to them in .debug_info so that gdb can print variables which change location during their lifetime. The following changes were made: - Refactored BasicBlockOffsetRanges to allow ranges to be tied to binary information (so that we can reuse it for location lists) - Implemented range compression optimization in BasicBlockOffsetRanges (needed otherwise too much data was being generated). - Added representation for location lists (LocationList.h, BinaryContext.h) - Implemented .debug_loc serializer that keeps the updated offsets (DebugLocWriter.{h,cpp}) - After disassembly, traverse entries in .debug_loc and save them in context (BinaryContext.cpp) - After optimizations, serialize .debug_loc and update pointers in .debug_info (RewriteInstance.cpp) (cherry picked from FBD3130682)	2016-04-01 11:37:28 -07:00
Gabriel Poesia	ffa9641e16	Update DWARF lexical blocks address ranges. Summary: Updates DWARF lexical blocks address ranges in the output binary after optimizations. This is similar to updating function address ranges except that the ranges representation needs to be more general, since address ranges can begin or end in the middle of a basic block. The following changes were made: - Added a data structure for iterating over the basic blocks that intersect an address range: BasicBlockTable.h - Added some more bookkeeping in BinaryBasicBlock. Basically, I needed to keep track of the block's size in the input binary as well as its address in the output binary. This information is mostly set by BinaryFunction after disassembly. - Added a representation for address ranges relative to basic blocks (BasicBlockOffsetRanges.h). Will also serve for location lists. - Added a representation for Lexical Blocks (LexicalBlock.h) - Small refactorings in DebugArangesWriter: -- Renamed to DebugRangesSectionsWriter since it also writes .debug_ranges -- Refactored it not to depend on BinaryFunction but instead on anything that can be assined an aoffset in .debug_ranges (added an interface for that) - Iterate over the DIE tree during initialization to find lexical blocks in .debug_info (BinaryContext.cpp) - Added patches to .debug_abbrev and .debug_info in RewriteInstance to update lexical blocks attributes (in fact, this part is very similar to what was done to function address ranges and I just refactored/reused that code) - Added small test case (lexical_blocks_address_ranges_debug.test) (cherry picked from FBD3113181)	2016-03-28 17:45:22 -07:00
Gabriel Poesia	466cbae866	Update subroutine address ranges in binary. Summary: [WIP] Update DWARF info for function address ranges. This diff currently does not work for unknown reasons, but I'm describing here what's the current state. According to both llvm-dwarf and readelf our output seems correct, but GDB does not interpret it as expected. All details go below in hope I missed something. I couldn't actually track the whole change that introduced support for what we need in gdb yet, but I think I can get to it (2007-12-04: Support lexical bocks and function bodies that occupy non-contiguous address ranges). I have reasons to believe gdb at least at some nges). The set of introduced changes was basically this: - After disassembly, iterate over the DIEs in .debug_info and find the ones that correspond to each BinaryFunction. - Refactor DebugArangesWriter to also write addresses of functions to .debug_ranges and track the offsets of function address ranges there - Add some infrastructure to facilitate patching the binary in simple ways (BinaryPatcher.h) - In RewriteInstance, after writing .debug_ranges already with function address ranges, for each function do: -- Find the abbreviation corresponding to the function -- Patch .debug_abbrev to replace DW_AT_low_pc with DW_AT_ranges and DW_AT_high_pc with DW_AT_producer (I'll explain this hack below). Also patch the corresponding forms to DW_FORM_sec_offset and DW_FORM_string (null-terminated in-place string). -- Patch debug_info with the .debug_ranges offset in place of the first 4 bytes of DW_AT_low_pc (DW_AT_ranges only occupies 4 bytes whereas low_pc occupies 8), and write an arbitrary string in-place in the other 12 bytes that were the 4 MSB of low_pc and the 8 bytes of high_pc before the patch. This depends on low_pc and high_pc being put consecutively by the compiler, but it serves to validate the idea. I tried another way of doing it that does not rely on this but it didn't work either and I believe the reason for either not working is the same (and still unknown, but unrelated to them. I might be wrong though, and if I find yet another way of doing it I may try it). The other way was to use a form of DW_FORM_data8 for the section offset. This is disallowed by the specification, but I doubt gdb validates this, as it's just easier to store it as 64-bit anyway as this is even necessary to support 64-bit DWARF (which is not what gcc generates by default apparently). I still need to make changes to the diff to make it production-ready, but first I want to figure out why it doesn't work as expected. By looking at the output of llvm-dwarfdump or readelf, all of .debug_ranges, .debug_abbrev and .debug_info seem to have been correctly updated. However, gdb seems to have serious problems with what we write. (In fact, readelf --debug-dump=Ranges shows some funny warning messages of the form ("Warning: There is a hole [0x100 - 0x120] in .debug_ranges"), but I played around with this and it seems it's just because no compile unit was using these ranges. Changing .debug_info apparently changes these warnings, so they seem to be unrelated to the section itself. Also looking at the hex dump of the section doesn't help, as everything seems fine. llvm-dwarfdump doesn't say anything. So I think .debug_ranges is fine.) The result is that gdb not only doesn't show the function name as we wanted, but it also stops showing line number information. Apparently it's not reading/interpreting the address ranges at all, and so the functions now have no associated address ranges, only the symbol value which allows one to put a breakpoint in the function, but not to show source code. As this left me without more ideas of what to try to feed gdb with, I believe the most promising next trial is to try to debug gdb itself, unless someone spots anything I missed. I found where the interesting part of the code lies for this case (gdb/dwarf2read.c and some other related files, but mainly that one). It seems in some parts gdb uses DW_AT_ranges for only getting its lowest and highest addresses and setting that as low_pc and high_pc (see dwarf2_get_pc_bounds in gdb's code and where it's called). I really hope this is not actually the case for function address ranges. I'll investigate this further. Otherwise I don't think any changes we make will make it work as initially intended, as we'll simply need gdb to support it and in that case it doesn't. (cherry picked from FBD3073641)	2016-03-16 18:08:29 -07:00
Gabriel Poesia	80ea31b24e	Write updated .debug_aranges section after optimizations. Summary: Write the .debug_aranges section after optimizations to the output binary. Each function generates at least one range and at most two (one extra for its cold part). The writing is done manually because LLVM's implementation is tied to the output of .debug_info (see EmitGenDwarfInfo and EmitGenDwarfARanges in lib/MC/MCDwarf.cpp), which we don't want to trigger right now. (cherry picked from FBD3043108)	2016-03-11 11:30:30 -08:00
Gabriel Poesia	77a6b72842	BOLT: Read and tie .debug_line info to IR. Summary: Reads information in the DWARF .debug_line section using LLVM and tie every MCInst to one line of a line table from the input binary. Subsequent diffs will update this information to match the final binary layout and output updated line tables. (cherry picked from FBD2989813)	2016-02-25 16:57:07 -08:00
Maksim Panchenko	d1526083fc	Rename binary optimizer to BOLT. Summary: BOLT - Binary Optimization and Layout Tool replaces FLO. I'm keeping .fdata extension for "feedback data". (cherry picked from FBD2908028)	2016-02-05 14:42:04 -08:00
Rafael Auler	c67a753e3c	Refactoring llvm-flo.cpp into a new class RewriteInstance, NFC. Summary: Previously, llvm-flo.cpp contained a long function doing lots of different tasks. This patch refactors this logic into a separate class with different member functions, exposing the relationship between each step of the rewritting process and making it easier to coordinate/change it. (cherry picked from FBD2691674)	2015-11-23 17:54:18 -08:00
Rafael Auler	2088875656	Teach llvm-flo how to read .eh_frame information from binaries Summary: In order to reorder binaries with C++ exceptions, we first need to read DWARF CFI (call frame info) from binaries in a table in the .eh_frame ELF section. This table contains unwinding information we need to be aware of when reordering basic blocks, so as to avoid corrupting it. This patch also cleans up some code from Exceptions.cpp due to a refactoring where we moved some functions to the LLVM's libSupport. (cherry picked from FBD2614464)	2015-11-05 13:37:30 -08:00
Maksim Panchenko	21cc191ea8	Added function to parse and dump .gcc_except_table Summary: Use '-print-exceptions' option to dump contents of .gcc_except_table. (cherry picked from FBD2609925)	2015-11-02 11:50:53 -07:00
Maksim Panchenko	b4ed5cc942	Make FLO work on hhvm binary. Summary: Fixes several issues that prevented us from running hhvm binary. (cherry picked from FBD2543057)	2015-10-14 15:35:14 -07:00
Rafael Auler	e1a539b0ec	Add initial implementation of DataReader Summary: This patch introduces DataReader, a module responsible for parsing llvm flo data files into in-memory data structures. (cherry picked from FBD2515754)	2015-10-05 18:31:25 -07:00
Maksim Panchenko	9a2fe7ebe4	Commit FLO with control flow graph. Summary: llvm-flo disassembles, builds control flow graph, and re-writes simple functions. (cherry picked from FBD2524024)	2015-10-09 17:21:14 -07:00
Maksim Panchenko	7927c14ff5	Fixed cmake. (cherry picked from FBD28108725)	2015-10-02 12:38:07 -07:00
Maksim Panchenko	575b24d719	Initial FLO commit. Summary: Directory created. (cherry picked from FBD28105260)	2015-10-02 11:55:15 -07:00

1 2

76 Commits