The ManualDWARFIndex class can create a index cache if the LLDB index
cache is enabled. This used to save the index to the same file,
regardless of wether the cache was a full index (no .debug_names) or a
partial index (have .debug_names, but not all .o files were had
.debug_names). So we could end up saving an index cache that was
partial, and then later load that partial index as if it were a full
index if the user set the 'settings set
plugin.symbol-file.dwarf.ignore-file-indexes true'. This would cause us
to ignore the .debug_names section, and if the index cache was enabled,
we could end up loading the partial index as if it were a full DWARF
index.
This patch detects when the ManualDWARFIndex is being used with
.debug_names, and saves out a cache file with a suffix of "-full" or
"-partial" to avoid this issue.
.. in the global namespace
The problem was the interaction of #116989 with an optimization in
GetTypesWithQuery. The optimization was only correct for non-exact
matches, but that didn't matter before this PR due to the "second layer
of defense". After that was removed, the query started returning more
types than it should.
This reverts commit 2526d5b168, reapplying
ba14dac481 after fixing the conflict with
#117532. The change is that Function::GetAddressRanges now recomputes
the returned value instead of returning the member. This means it now
returns a value instead of a reference type.
This is a follow-up/reimplementation of #115730. While working on that
patch, I did not realize that the correct (discontinuous) set of ranges
is already stored in the block representing the whole function. The
catch -- ranges for this block are only set later, when parsing all of
the blocks of the function.
This patch changes that by populating the function block ranges eagerly
-- from within the Function constructor. This also necessitates a
corresponding change in all of the symbol files -- so that they stop
populating the ranges of that block. This allows us to avoid some
unnecessary work (not parsing the function DW_AT_ranges twice) and also
results in some simplification of the parsing code.
It's basically true already (except for a brief time during
construction). This patch makes sure the objects are constructed with a
valid parent and enforces this in the type system, which allows us to
get rid of some nullptr checks.
In case of an error GetBlock would return a reference to a Block without
adding it to a parent. This doesn't seem like a good idea, and none of
the other plugins do that.
This patch fixes that by propagating errors (well, null pointers...) up
the stack.
I don't know of any specific problem that this solves, but given that
this occurs only when something goes very wrong (e.g. a corrupted PDB
file), it's quite possible noone has run into this situation, so we
can't say the code is correct either. It also gets in the way of a
refactor I'm contemplating.
In #108907, the index classes started filtering the DIEs according to
the full type query (instead of just the base name). This means that the
checks in SymbolFileDWARF are now redundant.
I've also moved the non-redundant checks so that now all checking is
done in the DWARFIndex class and the caller can expect to get the final
filtered list of types.
This is the second half of
https://github.com/llvm/llvm-project/pull/90008.
Essentially, it replaces the work of resolving template types when we
just need the qualified names with walking the DIE tree using
`DWARFTypePrinter`.
### Result
For an internal target, the time spent on `expr *this` for the first
time reduced from 28 secs to 17 secs.
Following up from https://github.com/llvm/llvm-project/pull/112928, we
can reuse the approach from Clang Sema to infer the MSInheritanceModel
and add the necessary attribute manually. This allows the inspection of
member function pointers with DWARF on Windows.
The class is only used from one place, which is trivial to implement
using the llvm class.
The main difference is that in the new implementation, the ranges are
parsed each time anew (instead of being parsed at startup and cached). I
believe this is fine because:
- this is already how things work with DWARF v5 debug_rnglists
- parsing debug_ranges is fairly fast (definitely faster than rnglists)
- generally, this result will be cached at a higher level anyway.
Browsing the code I did find one instance where that is not the case --
SymbolFileDWARF::ResolveFunctionAndBlock -- which is called each time we
resolve an address (to the block level). However, this function is
already pretty suboptimal: it first traverses the DIE tree (which
involves parsing all the DIE attributes) to find the correct block, then
it parses them again to construct the `lldb_private::Block`
representation, and *then* it uses the ID of the block DIE it found in
the first step to look up the `Block` object. If this turns out to be a
bottleneck, I think there are better ways to optimize it than caching
the debug_ranges parse.
The motiviation for this is that DWARFDebugRanges sorts the block
ranges, even though the order of the ranges is load-bearing (in the
absence of DW_AT_low_pc, the "base address" of a scope is determined by
the first range entry). Delaying the parsing (and sorting) step makes it
easier to access the first entry.
"statistics dump" currently report the statistics of all targets in
debugger instead of current target. This is wrong because there is a
"statistics dump --all-targets" option that supposed to include
everything.
This PR fixes the issue by only report statistics for current target
instead of all. It also includes the change to reset statistics debug
info/symbol table parsing/indexing time during debugger destroy. This is
required so that we report current statistics if we plan to reuse
lldb/lldb-dap across debug sessions
---------
Co-authored-by: jeffreytan81 <jeffreytan@fb.com>
This is the beginning of a different, more fundamental approach to
handling. This PR tries to tries to minimize functional changes. It only
makes sure that we store the true set of ranges inside the function
object, so that subsequent patches can make use of it.
In DWARF 4 and earlier `static const` members of structs, classes and
unions have an entry tag `DW_TAG_member`, and are also tagged as
`DW_AT_declaration`, but otherwise follow the same rules as
`DW_TAG_variable`.
This FORM already has support within LLDB to be parsed as a 16-byte
BLOCK, and all that is left to properly support it in the DWARFParser is
to add it to some enums.
With this, I can debug programs that use libstdc++.so.6.0.33 built with
GCC.
This doesn't parse S_CONSTANT case yet, because I found that the global
variable `std::strong_ordering::equal` is a S_CONSTANT and has type of
LF_STRUCTURE which is not currently handled when creating dwarf
expression for the variable. Left a TODO for it to finish later.
This makes `lldb/test/Shell/SymbolFile/PDB/ast-restore.test` and
`lldb/test/Shell/SymbolFile/PDB/calling-conventions-x86.test` pass on
windows with native pdb plugin only.
Swift types have mangled type names. This adds functionality to look up
those types through the FindTypes API by searching for the mangled type
name instead of the regular name.
## Summary
This PR is a continuation of
https://github.com/llvm/llvm-project/pull/108907 by using `.debug_names`
parent chain faster lookup for namespaces.
## Implementation
Similar to https://github.com/llvm/llvm-project/pull/108907. This PR
adds a new API: `GetNamespacesWithParents` in `DWARFIndex` base class.
The API performs the same function as `GetNamespaces()` with additional
filtering using parents `CompilerDeclContext`. A default implementation
is given in `DWARFIndex` class which parses debug info and performs the
matching. In the `DebugNameDWARFIndex` override, parents
`CompilerDeclContext` is cross checked with parent chain in
`.debug_names` for much faster filtering before fallback to base
implementation for final filtering.
## Performance Results
For the same benchmark used in
https://github.com/llvm/llvm-project/pull/108907, this PR improves: 48s
=> 28s
---------
Co-authored-by: jeffreytan81 <jeffreytan@fb.com>
ValueObject is part of lldbCore for historical reasons, but conceptually
it deserves to be its own library. This does introduce a (link-time) circular
dependency between lldbCore and lldbValueObject, which is unfortunate
but probably unavoidable because so many things in LLDB rely on
ValueObject. We already have cycles and these libraries are never built
as dylibs so while this doesn't improve the situation, it also doesn't
make things worse.
The header includes were updated with the following command:
```
find . -type f -exec sed -i.bak "s%include \"lldb/Core/ValueObject%include \"lldb/ValueObject/ValueObject%" '{}' \;
```
I've been getting complaints from users being spammed by -gmodules
missing file warnings going out of control because each object file
depends on an entire DAG of PCM files that usually are all missing at
once. To reduce this problem, this patch does two things:
1. Module now maintains a DenseMap<hash, once> that is used to display
each warning only once, based on its actual text.
2. The PCM warning itself is reworded to include less details, such as
the DIE offset, which is only useful to LLDB developers, who can get
this from the dwarf log if they need it. Because the detail is omitted
the hashing from (1) deduplicates the warnings.
rdar://138144624
## Summary
This PR improves `SymbolFileDWARF::FindTypes()` by utilizing the newly
added parent chain `DW_IDX_parent` in `.debug_names`. The proposal was
originally discussed in [this
RFC](https://discourse.llvm.org/t/rfc-improve-dwarf-5-debug-names-type-lookup-parsing-speed/74151).
## Implementation
To leverage the parent chain for `SymbolFileDWARF::FindTypes()`, this PR
adds a new API: `GetTypesWithQuery` in `DWARFIndex` base class. The API
performs the same function as `GetTypes` with additional filtering using
`TypeQuery`. Since this only introduces filtering, the callback
mechanisms at all call sites remain unchanged. A default implementation
is given in `DWARFIndex` class which parses debug info and performs the
matching. In the `DebugNameDWARFIndex` override, the parent_contexts in
the `TypeQuery` is cross checked with parent chain in `.debug_names` for
for much faster filtering before fallback to base implementation for
final filtering.
Unlike the `GetFullyQualifiedType` API, which fully consumes the
`DW_IDX_parent` parent chain for exact matching, these new APIs perform
partial subset matching for type/namespace queries. This is necessary to
support queries involving anonymous or inline namespaces. For instance,
a user might request `NS1::NS2::NS3::Foo`, while the index table's
parent chain might contain `NS1::inline_NS2::NS3::Foo`, which would fail
exact matching.
## Performance Results
In one of our internal target using `.debug_names` + split dwarf.
Expanding a "this" pointer in locals view in VSCode:
94s => 48s. (Not sure why I got 94s this time instead of 70s last week).
---------
Co-authored-by: jeffreytan81 <jeffreytan@fb.com>
LLDB code for using the type layout data from DWARF was not kicking in
for types which were initially parsed from a declaration. The problem
was in these lines of code:
```
if (type)
layout_info.bit_size = type->GetByteSize(nullptr).value_or(0) * 8;
```
which determine the types layout size by getting the size from the
lldb_private::Type object. The problem is that if the type object does
not have this information cached, this request can trigger another
(recursive) request to lay out/complete the type. This one, somewhat
surprisingly, succeeds, but does that without the type layout
information (because it hasn't been computed yet). The reasons why this
hasn't been noticed so far are:
- this is a relatively new bug. I haven't checked but I suspect it was
introduced in the "delay type definition search" patchset from this
summer -- if we search for the definition eagerly, we will always have a
cached size value.
- it requires the presence of another bug/issue, as otherwise the
automatically computed layout will match the real thing.
- it reproduces (much) more easily with -flimit-debug-info (though it is
possible to trigger it without that flag).
My fix consists of always fetching type size information from DWARF
(which so far existed as a fallback path). I'm not quite sure why this
code was there in the first place (the code goes back to before the
Great LLDB Reformat), but I don't believe it is necessary, as the type
size (for types parsed from definition DIEs) is set exactly from this
attribute (in ParseStructureLikeDIE).
They are close enough to swap lldb's `DWARFDebugArangeSet` with the llvm
one.
The difference is that llvm's `DWARFDebugArangeSet` add empty ranges
when extracting. To accommodate this, `DWARFDebugAranges` in lldb
filters out empty ranges when extracting.
This patch allows offsets to the DW_AT_frame_base to exceed 4GB. Prior
to this, for any .debug_info.dwo offset that exceeded 4GB, we would
truncate the offset to the DW_AT_frame_base expression bytes to be 32
bit only. Changing the offset to 64 bits restores correct functionality.
No test for this as we don't want to create a .dwp file that has a
.debug_info.dwo size that is over 4GB.
lldb has two RegisterLocation classes that do slightly different things.
UnwindPlan::Row::RegisterLocation (new: AbstractRegisterLocation) has a
description of how to find a register's value or location, not specific
to a particular stopping point. It may say that at a given offset into a
function, the caller's register has been spilled to stack memory at CFA
minus an offset. Or it may say that the caller's register is at a DWARF
exprssion.
UnwindLLDB::RegisterLocation (new: ConcreteRegisterLocation) is a
specific address where the register is currently stored, or the register
it has been copied into, or its value at this point in the current
function execution.
When lldb stops in a function, it interprets the
AbstractRegisterLocation's instructions using the register context and
stack memory, to create the ConcreteRegisterLocation at this point in
time for this stack frame.
I'm not thrilled with AbstractRegisterLocation and
ConcreteRegisterLocation, but it's better than the same name and it will
be easier to update them if someone suggests a better pair.
This can legitimately happen for static function-local variables with a
non-manual dwarf index. According to the DWARF spec, these variables
should be (and are) included in the compiler generated indexes, but they
are ignored by our manual index. Encountering them does not indicate any
sort of error.
The error message is particularly annoying due to a combination of the
fact that we don't cache negative hits (so we print it every time we
encounter it), VSCode's aggresive tab completion (which attempts a
lookup after every keypress) and the fact that some low-level libraries
(e.g. tcmalloc) have several local variables called "v". The result is a
console full of error messages everytime you type "v ".
We already have tests (e.g. find-basic-variable.cpp), which check that
these variables are not included in the result (and by extension, that
their presence does not crash lldb). It would be possible to extend it
to make sure it does *not* print this error message, but it doesn't seem
like it would be particularly useful.
As specified in the docs,
1) raw_string_ostream is always unbuffered and
2) the underlying buffer may be used directly
( 65b13610a5 for further reference )
* Don't call raw_string_ostream::flush(), which is essentially a no-op.
* Avoid unneeded calls to raw_string_ostream::str(), to avoid excess
indirection.
This patch fixes:
lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp:2935:31:
error: designated initializers are a C++20 extension
[-Werror,-Wc++20-designator]
This bug surfaced after https://github.com/llvm/llvm-project/pull/105865
(currently reverted, but blocked on this to be relanded).
Because Clang doesn't emit `DW_TAG_member`s for unnamed bitfields, LLDB
has to make an educated guess about whether they existed in the source.
It does so by checking whether there is a gap between where the last
field ended and the currently parsed field starts. In the example test
case, the empty field `padding` was folded into the storage of `data`.
Because the `bit_offset` of `padding` is `0x0` and its `DW_AT_byte_size`
is `0x1`, LLDB thinks the field ends at `0x1` (not quite because we
first round the size to a word size, but this is an implementation
detail), erroneously deducing that there's a gap between `flag` and
`padding`.
This patch adds the notion of "effective field end", which accounts for
fields that share storage. It is set to the end of the storage that the
two fields occupy. Then we use this to check for gaps in the unnamed
bitfield creation logic.
This logic will need adjusting soon for
https://github.com/llvm/llvm-project/pull/108155
This patch pulls out the logic for detecting/creating unnamed bitfields
out of `ParseSingleMember` to make the latter (in my opinion) more
readable. Otherwise we have a large number of similarly named variables
in scope.
This allows e.g. DWARFDIE::GetName() to return the name of the type when
looking at its declaration (which contains only
DW_AT_declaration+DW_AT_signature). This is similar to how we recurse
through DW_AT_specification when looking for a function name. Llvm dwarf
parser has obtained the same functionality through #99495.
This fixes a bug where we would confuse a type like NS::Outer::Struct
with NS::Struct (because NS::Outer (and its name) was in a type unit).
(this is lldb part)
Without these explicit includes, removing other headers, who implicitly
include llvm-config.h, may have non-trivial side effects. For example,
`clangd` may report even `llvm-config.h` as "no used" in case it defines
a macro, that is explicitly used with #ifdef. It is actually amplified
with different build configs which use different set of macros.
We need to resolve the type signature to get a hold of the template
argument dies.
The associated test case passes even without this patch, but it only
does it by accident (because the subsequent code considers the types to
be in an anonymous namespace and this not subject to uniqueing). This
will change once my other patch starts resolving names correctly.
This patch extends TypeQuery matching to support anonymous namespaces. A
new flag is added to control the behavior. In the "strict" mode, the
query must match the type exactly -- all anonymous namespaces included.
The dynamic type resolver in the itanium abi (the motivating use case
for this) uses this flag, as it queries using the name from the
demangles, which includes anonymous namespaces.
This ensures we don't confuse a type with a same-named type in an
anonymous namespace. However, this does *not* ensure we don't confuse
two types in anonymous namespacs (in different CUs). To resolve this, we
would need to use a completely different lookup algorithm, which
probably also requires a DWARF extension.
In the "lax" mode (the default), the anonymous namespaces in the query
are optional, and this allows one search for the type using the usual
language rules (`::A` matches `::(anonymous namespace)::A`).
This patch also changes the type context computation algorithm in
DWARFDIE, so that it includes anonymous namespace information. This
causes a slight change in behavior: the algorithm previously stopped
computing the context after encountering an anonymous namespace, which
caused the outer namespaces to be ignored. This meant that a type like
`NS::(anonymous namespace)::A` would be (incorrectly) recognized as
`::A`). This can cause code depending on the old behavior to misbehave.
The fix is to specify all the enclosing namespaces in the query, or use
a non-exact match.
My build of LLDB is all the time loading targets with a version of
libc++ that was built with gcc that uses the DW_FORM 0x1e that is not
implemented by LLVM, and I doubt it'll ever implement it. It's used for
some 128 bit encoding of numbers, which is just very weird. Because of
this, LLDB is showing some warnings all the time for my users, so I'm
adding a flag to control the enablement of this warning.
This patch removes all of the Set.* methods from Status.
This cleanup is part of a series of patches that make it harder use the
anti-pattern of keeping a long-lives Status object around and updating
it while dropping any errors it contains on the floor.
This patch is largely NFC, the more interesting next steps this enables
is to:
1. remove Status.Clear()
2. assert that Status::operator=() never overwrites an error
3. remove Status::operator=()
Note that step (2) will bring 90% of the benefits for users, and step
(3) will dramatically clean up the error handling code in various
places. In the end my goal is to convert all APIs that are of the form
` ResultTy DoFoo(Status& error)
`
to
` llvm::Expected<ResultTy> DoFoo()
`
How to read this patch?
The interesting changes are in Status.h and Status.cpp, all other
changes are mostly
` perl -pi -e 's/\.SetErrorString/ = Status::FromErrorString/g' $(git
grep -l SetErrorString lldb/source)
`
plus the occasional manual cleanup.
This PR is in reference to porting LLDB on AIX.
Link to discussions on llvm discourse and github:
1. https://discourse.llvm.org/t/port-lldb-to-ibm-aix/80640
2. #101657
The complete changes for porting are present in this draft PR:
https://github.com/llvm/llvm-project/pull/102601
The changes on this PR are intended to avoid namespace collision for
certain typedefs between lldb and other platforms:
1. tid_t --> lldb::tid_t
2. offset_t --> lldb::offset_t
The only change vs. the first version of the patch is that I've made
DWARFUnit linking thread-safe/unit. This was necessary because, during
the
indexing step, two skeleton units could attempt to associate themselves
with the split unit.
The original commit message was:
I ran into this when LTO completely emptied two compile units, so they
ended up with the same hash (see #100375). Although, ideally, the
compiler would try to ensure we don't end up with a hash collision even
in this case, guaranteeing their absence is practically impossible. This
patch ensures this situation does not bring down lldb.
The `CompilerType` is just a wrapper around two pointers, and there is
no usage of the `CompilerType` where those are expected to change
underneath the caller.
To make the interface more straightforward to reason about, this patch
changes all instances of `CompilerType&` to `const CompilerType&` around
the `DWARFASTParserClang` APIs.
We could probably pass these by-value, but all other APIs don't, and
this patch just makes the parameter passing convention consistent with
the rest of the file.