Indexing a single DWARF unit is a fairly small task, which means the
overhead of enqueueing a task for each unit is not negligible (mainly
because introduces a lot of synchronization points for queue management,
memory allocation etc.). This is particularly true if the binary was
built with type units, as these are usually very small.
This essentially brings us back to the state before
https://reviews.llvm.org/D78337, but the new implementation is built on
the llvm ThreadPool, and I've added a small improvement -- we now
construct one "index set" per thread instead of one per unit, which
should lower the memory usage (fewer small allocations) and make the
subsequent merge step faster.
On its own this patch doesn't actually change the performance
characteristics because we still have one choke point -- progress
reporting. I'm leaving that for a separate patch, but I've tried that
simply removing the progress reporting gives us about a 30-60% speed
boost.