We are brining a new algorithm for function layout (reordering) based on the call graph (extracted from a profile data). The algorithm is an improvement of top of a known heuristic, C^3. It tries to co-locate hot and frequently executed together functions in the resulting ordering. Unlike C^3, it explores a larger search space and have an objective closely tied to the performance of instruction and i-TLB caches. Hence, the name CDS = Cache-Directed Sort. The algorithm can be used at the linking or post-linking (e.g., BOLT) stage. Refer to https://reviews.llvm.org/D152834 for the actual implementation of the reordering algorithm. This diff adds a linker option to replace the existing C^3 heuristic with CDS. The new behavior can be turned on by passing "--use-cache-directed-sort". (the plan is to make it default in a next diff) **Perf-impact** clang-10 binary (built with LTO+AutoFDO/CSSPGO): wins on top of C^3 in [0.3%..0.8%] rocksDB-8 binary (built with LTO+CSSPGO): wins on top of C^3 in [0.8%..1.5%] Note that function layout affects the perf the most on older machines (with smaller instruction/iTLB caches) and when huge pages are not enabled. The impact on newer processors with huge pages enabled is likely neutral/minor. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D152840
23 lines
707 B
C++
23 lines
707 B
C++
//===- CallGraphSort.h ------------------------------------------*- C++ -*-===//
|
|
//
|
|
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
|
|
// See https://llvm.org/LICENSE.txt for license information.
|
|
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
|
//
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
#ifndef LLD_ELF_CALL_GRAPH_SORT_H
|
|
#define LLD_ELF_CALL_GRAPH_SORT_H
|
|
|
|
#include "llvm/ADT/DenseMap.h"
|
|
|
|
namespace lld::elf {
|
|
class InputSectionBase;
|
|
|
|
llvm::DenseMap<const InputSectionBase *, int> computeCacheDirectedSortOrder();
|
|
|
|
llvm::DenseMap<const InputSectionBase *, int> computeCallGraphProfileOrder();
|
|
} // namespace lld::elf
|
|
|
|
#endif
|