If the symbol is local we don't need to create a R_X86_64_DTPOFF64, we
can just write the correct value in the got.
Should fix pr28018.
llvm-svn: 272205
MergedInputSection::getOffset is the busiest function in LLD if string
merging is enabled and input files have lots of mergeable sections.
It is usually the case when creating executable with debug info,
so it is pretty common.
The reason why it is slow is because it has to do faily complex
computations. For non-mergeable sections, section contents are
contiguous in output, so in order to compute an output offset,
we only have to add the output section's base address to an input
offset. But for mergeable strings, section contents are split for
merging, so they are not contigous. We've got to do some lookups.
We used to do binary search on the list of section pieces.
It is slow because I think it's hostile to branch prediction.
This patch replaces it with hash table lookup. Seems it's working
pretty well. Below is "perf stat -r10" output when linking clang
with debug info. In this case this patch speeds up about 4%.
Before:
6584.153205 task-clock (msec) # 1.001 CPUs utilized ( +- 0.09% )
238 context-switches # 0.036 K/sec ( +- 6.59% )
0 cpu-migrations # 0.000 K/sec ( +- 50.92% )
1,067,675 page-faults # 0.162 M/sec ( +- 0.15% )
18,369,931,470 cycles # 2.790 GHz ( +- 0.09% )
9,640,680,143 stalled-cycles-frontend # 52.48% frontend cycles idle ( +- 0.18% )
<not supported> stalled-cycles-backend
21,206,747,787 instructions # 1.15 insns per cycle
# 0.45 stalled cycles per insn ( +- 0.04% )
3,817,398,032 branches # 579.786 M/sec ( +- 0.04% )
132,787,249 branch-misses # 3.48% of all branches ( +- 0.02% )
6.579106511 seconds time elapsed ( +- 0.09% )
After:
6312.317533 task-clock (msec) # 1.001 CPUs utilized ( +- 0.19% )
221 context-switches # 0.035 K/sec ( +- 4.11% )
1 cpu-migrations # 0.000 K/sec ( +- 45.21% )
1,280,775 page-faults # 0.203 M/sec ( +- 0.37% )
17,611,539,150 cycles # 2.790 GHz ( +- 0.19% )
10,285,148,569 stalled-cycles-frontend # 58.40% frontend cycles idle ( +- 0.30% )
<not supported> stalled-cycles-backend
18,794,779,900 instructions # 1.07 insns per cycle
# 0.55 stalled cycles per insn ( +- 0.03% )
3,287,450,865 branches # 520.799 M/sec ( +- 0.03% )
72,259,605 branch-misses # 2.20% of all branches ( +- 0.01% )
6.307411828 seconds time elapsed ( +- 0.19% )
Differential Revision: http://reviews.llvm.org/D20645
llvm-svn: 270999
MIPS .reginfo and .MIPS.options sections are consumed by the linker, and
the linker produces a single output section. But it is possible that
input files contain section symbol points to the corresponding input
section. In case of generation a relocatable output we need to write
such symbols to the output file.
Fixes bug 27878.
Differential Revision: http://reviews.llvm.org/D20688
llvm-svn: 270910
This patch makes SectionPiece class 8 bytes smaller on platforms
on which pointer size is 8 bytes. Sean suggested in a post commit
review for r270340 that this could make a differentce, and it
actually is. Time to link clang (with debug info) improved from
6.725 seconds to 6.589 seconds or by about 2%.
Differential Revision: http://reviews.llvm.org/D20613
llvm-svn: 270717
This patch addresses a post-commit review for r270325. r270325
introduced getReloc function that searches a relocation for a
given range. It always started searching from beginning of relocation
vector, so it was slower than before. Previously, we used to use
the fact that the relocations are sorted. This patch restore it.
llvm-svn: 270572
.eh_frame_hdr assumes that there is only one .eh_frame and
ensures it by assertions. This patch makes .eh_frame a real
singleton object to simplify.
llvm-svn: 270445
Previously, EhFrameHdr section computed addresses to which FDEs are
applied to. This is not an ideal design because EhFrameHdr does not
know much about FDEs unless EhFrame passes the information to EhFrameHdr.
It is what we did.
This patch simplifies the code by making EhFrame to compute the
values and pass the cooked information to EhFrameHdr. EhFrameHdr no
longer have to know about the details of FDEs such as FDE encodings.
llvm-svn: 270393
This patch refactors EHOutputSection using SectionPiece struct.
EHRegion class was removed since we can now directly use SectionPiece.
An incomplete support of large CIE/FDE record (> 2^32 bytes) was removed
because it silently created broken executable. There are several places
in the existing code that "size" field is always 4 bytes and at offset 4
in the record, which is not true for 64-bit size records. We will have to
support that in future, but it is better to error out instead of creating
malformed eh_frame sections.
llvm-svn: 270382
This patch adds Size member to SectionPiece so that getRangeAndSize
can just return a SectionPiece instead of a std::pair<SectionPiece *, uint_t>.
Also renamed the function.
llvm-svn: 270346
We were using std::pair to represents pieces of splittable section
contents. It hurt readability because "first" and "second" are not
meaningful. This patch give them names.
One more thing is that piecewise liveness information is stored to
the second element of the pair as a special value of output section
offset. It was confusing, so I defiend a new bit, "Live", in the
new struct.
llvm-svn: 270340
This fixes a potential bug when cross linking very large executables
on LLP64 machines such as Windows. On such platform, uintX_t is 64 bits
while unsigned is 32 bits.
llvm-svn: 270327
Most functions take destination buffers as the first arguments
just like memcpy, so this order is easier to read.
Also simplified the function.
llvm-svn: 270324
Lazy binding is quite important for use case like a shared build of
llvm. Also, if someone wants to disable it, it is better done in the
compiler (disable plt generation).
The only reason to keep it is to make it easier to add a new
architecture. But it doesn't really help much as it is possible to start
with non lazy relocation and plt code but still let the generic part
create a dedicated .got.plt and .rela.plt.
llvm-svn: 269982
If you specify the option in the form of --build-id=0x<hexstring>,
that hexstring is set as a build ID. We observed that the feature
is actually in use in some builds, so we want this feature.
llvm-svn: 269495
win32 was my case.
Before that change test failed with next error for me:
23> ******************** TEST 'lld :: ELF/mips-64-got.s' FAILED ********************
....
23> Command 3 Stderr:
23> relocation R_MIPS_GOT_PAGE out of range
llvm-svn: 269166
This is the option which sorts relocs to optimize dynamic linker performance.
-z combelocs is the default in gold, also it ignores -z nocombreloc,
this patch do the same.
Patch sorts relocations by symbols only and do not create any
DT_REL[A]COUNT entries. That is different with what gold/bfd do.
More information about option is here:
http://www.airs.com/blog/archives/186http://people.redhat.com/jakub/prelink.pdf, p.2
Differential revision: http://reviews.llvm.org/D19528
llvm-svn: 269066
We were previously using an output offset of -1 for both GC'd and tail
merged pieces. We need to distinguish these two cases in order to filter
GC'd symbols from the symbol table -- we were previously asserting when we
asked for the VA of a symbol pointing into a dead piece, which would end
up asking the tail merging string table for an offset even though we hadn't
initialized it properly.
This patch fixes the bug by using an offset of -1 to exclusively mean GC'd
pieces, using 0 for tail merges, and distinguishing the tail merge case from
an offset of 0 by asking the output section whether it is tail merge.
Differential Revision: http://reviews.llvm.org/D19953
llvm-svn: 268604
MIPS N64 ABI introduces .MIPS.options section which specifies miscellaneous
options to be applied to an object/shared/executable file. LLVM as well as
modern versions of GNU tools read and write the only type of the options -
ODK_REGINFO. It is exact copy of .reginfo section used by O32 ABI.
llvm-svn: 268485
Weak undefined symbols resolve to the image base. This is a little strange,
but it allows us to link function calls to such symbols. Normally such a
call will be guarded with a comparison, which will load a zero from the GOT.
There's one example of such a function call in crti.o in Linux's CRT.
As part of this change, I also needed to make the synthetic start and end
symbols image base relative in the case where their sections were empty,
so that PC-relative references to those symbols would continue to work.
Differential Revision: http://reviews.llvm.org/D19844
llvm-svn: 268350
This change simplifies the BuildId classes by removing a few member
functions and variables from them. It should also make it easy to
parallelize hash computation in future because now each BuildId object
see all inputs rather than one at a time.
llvm-svn: 268333