When parsing an optimized value and reading a piece from a file address,
LLDB tries to read the data from memory using that address.
This patch converts file address to load address before reading the
memory.
Fixes#111313Fixes#97484
This is the second half of
https://github.com/llvm/llvm-project/pull/90008.
Essentially, it replaces the work of resolving template types when we
just need the qualified names with walking the DIE tree using
`DWARFTypePrinter`.
### Result
For an internal target, the time spent on `expr *this` for the first
time reduced from 28 secs to 17 secs.
Following up from https://github.com/llvm/llvm-project/pull/112928, we
can reuse the approach from Clang Sema to infer the MSInheritanceModel
and add the necessary attribute manually. This allows the inspection of
member function pointers with DWARF on Windows.
The class is only used from one place, which is trivial to implement
using the llvm class.
The main difference is that in the new implementation, the ranges are
parsed each time anew (instead of being parsed at startup and cached). I
believe this is fine because:
- this is already how things work with DWARF v5 debug_rnglists
- parsing debug_ranges is fairly fast (definitely faster than rnglists)
- generally, this result will be cached at a higher level anyway.
Browsing the code I did find one instance where that is not the case --
SymbolFileDWARF::ResolveFunctionAndBlock -- which is called each time we
resolve an address (to the block level). However, this function is
already pretty suboptimal: it first traverses the DIE tree (which
involves parsing all the DIE attributes) to find the correct block, then
it parses them again to construct the `lldb_private::Block`
representation, and *then* it uses the ID of the block DIE it found in
the first step to look up the `Block` object. If this turns out to be a
bottleneck, I think there are better ways to optimize it than caching
the debug_ranges parse.
The motiviation for this is that DWARFDebugRanges sorts the block
ranges, even though the order of the ranges is load-bearing (in the
absence of DW_AT_low_pc, the "base address" of a scope is determined by
the first range entry). Delaying the parsing (and sorting) step makes it
easier to access the first entry.
This is a follow-up for the conversation here
https://github.com/llvm/llvm-project/pull/115722/.
This test is designed for Windows target/PDB format, so it shouldn't be
built and run for DWARF/etc.
This test fails on
https://lab.llvm.org/staging/#/builders/197/builds/76/steps/18/logs/FAIL__lldb-shell__inline_sites_live_cpp
because of a little difference in the lldb output.
```
# .---command stderr------------
# | C:\buildbot\as-builder-10\lldb-x-aarch64\llvm-project\lldb\test\Shell\SymbolFile\NativePDB\inline_sites_live.cpp:25:11: error: CHECK: expected string not found in input
# | // CHECK: * thread #1, stop reason = breakpoint 1
# | ^
# | <stdin>:1:1: note: scanning from here
# | (lldb) platform select remote-linux
# | ^
# | <stdin>:28:27: note: possible intended match here
# | * thread #1, name = 'inline_sites_li', stop reason = breakpoint 1.3
# | ^
# |
```
Since the remote Shell test execution feature was added, these tests
should now be disabled on Windows target instead of Windows host.
It should fix failures on
https://lab.llvm.org/staging/#/builders/197/builds/76.
This is the beginning of a different, more fundamental approach to
handling. This PR tries to tries to minimize functional changes. It only
makes sure that we store the true set of ranges inside the function
object, so that subsequent patches can make use of it.
This FORM already has support within LLDB to be parsed as a 16-byte
BLOCK, and all that is left to properly support it in the DWARFParser is
to add it to some enums.
With this, I can debug programs that use libstdc++.so.6.0.33 built with
GCC.
This doesn't parse S_CONSTANT case yet, because I found that the global
variable `std::strong_ordering::equal` is a S_CONSTANT and has type of
LF_STRUCTURE which is not currently handled when creating dwarf
expression for the variable. Left a TODO for it to finish later.
This makes `lldb/test/Shell/SymbolFile/PDB/ast-restore.test` and
`lldb/test/Shell/SymbolFile/PDB/calling-conventions-x86.test` pass on
windows with native pdb plugin only.
Remove lldb-repro which was used to run the test suite against a
reproducer. The corresponding functionality has been removed from LLDB
so there's no need for the tool anymore.
Swift types have mangled type names. This adds functionality to look up
those types through the FindTypes API by searching for the mangled type
name instead of the regular name.
Member pointers refer to data or function members of a `CXXRecordDecl`,
which require a `MSInheritanceAttr` in order to be complete. Without that
we cannot calculate the size of a member pointer in memory. The attempt
has been causing a crash further down in the clang AST context. In order
to implement the feature, DWARF will need a new attribtue to convey the
information. For the moment, this patch teaches LLDB to handle to
situation and avoid the crash.
Member pointers refer to data or function members of a `CXXRecordDecl` and
require a `MSInheritanceAttr` in order to be complete. Without that we
cannot calculate their size in memory. The attempt has been causing a crash
further down in the clang AST context. In order to implement the feature,
DWARF will need a new attribtue to convey the information. For the moment,
this patch teaches LLDB to handle to situation and avoid the crash.
I've been getting complaints from users being spammed by -gmodules
missing file warnings going out of control because each object file
depends on an entire DAG of PCM files that usually are all missing at
once. To reduce this problem, this patch does two things:
1. Module now maintains a DenseMap<hash, once> that is used to display
each warning only once, based on its actual text.
2. The PCM warning itself is reworded to include less details, such as
the DIE offset, which is only useful to LLDB developers, who can get
this from the dwarf log if they need it. Because the detail is omitted
the hashing from (1) deduplicates the warnings.
rdar://138144624
Line ending policies were changed in the parent, dccebddb3b. To make
it easier to resolve downstream merge conflicts after line-ending
policies are adjusted this is a separate whitespace-only commit. If you
have merge conflicts as a result, you can simply `git add --renormalize
-u && git merge --continue` or `git add --renormalize -u && git rebase
--continue` - depending on your workflow.
Follow up to https://github.com/llvm/llvm-project/pull/111902.
Makes sure all the `no_unique_address` tests are in the same place and
we don't rely on the host target triple (which means we don't need to
account for `[[msvc::no_unique_address]]` on Windows).
Now that we don't compile with the host compiler, this patch also adds
`-c` to the compilation command since we don't actually need the linked
binary in the test anyway (and on Darwin linking through Clang requires
the `xcrun` prefix to set up the SDK paths, etc.). We already do this in
`no_unique_address-with-bitfields.cpp` anyway.
Fixed the error `unable to create target: 'No available targets are
compatible with triple "x86_64-apple-macosx10.4.0"'` running `clang
--target=x86_64-apple-macosx -c -gdwarf -o %t %s`.
LLDB code for using the type layout data from DWARF was not kicking in
for types which were initially parsed from a declaration. The problem
was in these lines of code:
```
if (type)
layout_info.bit_size = type->GetByteSize(nullptr).value_or(0) * 8;
```
which determine the types layout size by getting the size from the
lldb_private::Type object. The problem is that if the type object does
not have this information cached, this request can trigger another
(recursive) request to lay out/complete the type. This one, somewhat
surprisingly, succeeds, but does that without the type layout
information (because it hasn't been computed yet). The reasons why this
hasn't been noticed so far are:
- this is a relatively new bug. I haven't checked but I suspect it was
introduced in the "delay type definition search" patchset from this
summer -- if we search for the definition eagerly, we will always have a
cached size value.
- it requires the presence of another bug/issue, as otherwise the
automatically computed layout will match the real thing.
- it reproduces (much) more easily with -flimit-debug-info (though it is
possible to trigger it without that flag).
My fix consists of always fetching type size information from DWARF
(which so far existed as a fallback path). I'm not quite sure why this
code was there in the first place (the code goes back to before the
Great LLDB Reformat), but I don't believe it is necessary, as the type
size (for types parsed from definition DIEs) is set exactly from this
attribute (in ParseStructureLikeDIE).
This patch is a reworking of Pete Lawrence's (@PortalPete) proposal
for better expression evaluator error messages:
https://github.com/llvm/llvm-project/pull/80938
Before:
```
$ lldb -o "expr a+b"
(lldb) expr a+b
error: <user expression 0>:1:1: use of undeclared identifier 'a'
a+b
^
error: <user expression 0>:1:3: use of undeclared identifier 'b'
a+b
^
```
After:
```
(lldb) expr a+b
^ ^
│ ╰─ error: use of undeclared identifier 'b'
╰─ error: use of undeclared identifier 'a'
```
This eliminates the confusing `<user expression 0>:1:3` source
location and avoids echoing the expression to the console again, which
results in a cleaner presentation that makes it easier to grasp what's
going on. You can't see it here, bug the word "error" is now also in
color, if so desired.
Depends on https://github.com/llvm/llvm-project/pull/106442.
This patch is a reworking of Pete Lawrence's (@PortalPete) proposal
for better expression evaluator error messages:
https://github.com/llvm/llvm-project/pull/80938
Before:
```
$ lldb -o "expr a+b"
(lldb) expr a+b
error: <user expression 0>:1:1: use of undeclared identifier 'a'
a+b
^
error: <user expression 0>:1:3: use of undeclared identifier 'b'
a+b
^
```
After:
```
(lldb) expr a+b
^ ^
│ ╰─ error: use of undeclared identifier 'b'
╰─ error: use of undeclared identifier 'a'
```
This eliminates the confusing `<user expression 0>:1:3` source
location and avoids echoing the expression to the console again, which
results in a cleaner presentation that makes it easier to grasp what's
going on. You can't see it here, bug the word "error" is now also in
color, if so desired.
Depends on https://github.com/llvm/llvm-project/pull/106442.
This bug surfaced after https://github.com/llvm/llvm-project/pull/105865
(currently reverted, but blocked on this to be relanded).
Because Clang doesn't emit `DW_TAG_member`s for unnamed bitfields, LLDB
has to make an educated guess about whether they existed in the source.
It does so by checking whether there is a gap between where the last
field ended and the currently parsed field starts. In the example test
case, the empty field `padding` was folded into the storage of `data`.
Because the `bit_offset` of `padding` is `0x0` and its `DW_AT_byte_size`
is `0x1`, LLDB thinks the field ends at `0x1` (not quite because we
first round the size to a word size, but this is an implementation
detail), erroneously deducing that there's a gap between `flag` and
`padding`.
This patch adds the notion of "effective field end", which accounts for
fields that share storage. It is set to the end of the storage that the
two fields occupy. Then we use this to check for gaps in the unnamed
bitfield creation logic.
Print a warning when the debugger detects a mismatch between the MD5
checksum in the DWARF 5 line table and the file on disk. The warning is
printed only once per file.
This is the root-cause for the LLDB failures that started occurring
after https://github.com/llvm/llvm-project/pull/105865.
The DWARFASTParserClang has logic to try derive unnamed bitfields from
DWARF offsets. In this case we treat `padding` as a 1-byte size field
that would overlap with `flag`, and decide we need to introduce an
unnamed bitfield into the AST, which is incorrect.
This allows e.g. DWARFDIE::GetName() to return the name of the type when
looking at its declaration (which contains only
DW_AT_declaration+DW_AT_signature). This is similar to how we recurse
through DW_AT_specification when looking for a function name. Llvm dwarf
parser has obtained the same functionality through #99495.
This fixes a bug where we would confuse a type like NS::Outer::Struct
with NS::Struct (because NS::Outer (and its name) was in a type unit).
We need to resolve the type signature to get a hold of the template
argument dies.
The associated test case passes even without this patch, but it only
does it by accident (because the subsequent code considers the types to
be in an anonymous namespace and this not subject to uniqueing). This
will change once my other patch starts resolving names correctly.
The only change vs. the first version of the patch is that I've made
DWARFUnit linking thread-safe/unit. This was necessary because, during
the
indexing step, two skeleton units could attempt to associate themselves
with the split unit.
The original commit message was:
I ran into this when LTO completely emptied two compile units, so they
ended up with the same hash (see #100375). Although, ideally, the
compiler would try to ensure we don't end up with a hash collision even
in this case, guaranteeing their absence is practically impossible. This
patch ensures this situation does not bring down lldb.
…Type
This is needed to ensure we find a type if its definition is in a CU
that wasn't indexed. This can happen if the definition is in some
precompiled code (e.g. the c++ standard library) which was built with
different flags than the rest of the binary.
I ran into this when LTO completely emptied two compile units, so they
ended up with the same hash (see #100375). Although, ideally, the
compiler would try to ensure we don't end up with a hash collision even
in this case, guaranteeing their absence is practically impossible. This
patch ensures this situation does not bring down lldb.
This fixes a regression caused by delayed type definition searching
(#96755 and friends): If we end up adding a member (e.g. a typedef) to a
type that we've already attempted to complete (and failed), the
resulting AST would end up inconsistent (we would start to "forcibly"
complete it, but never finish it), and importing it into an expression
AST would crash.
This patch fixes this by detecting the situation and finishing the
definition as well.
This is an alternative, much simpler implementation of #99305. In this
version I replace the AnyModule wildcard match with a special TypeQuery
flag which achieves (mostly) the same thing.
It is a preparatory step for teaching ContextMatches about anonymous
namespaces. It started out as a way to remove the assumption that the
pattern and target contexts must be of the same length -- that's will
not be correct with anonymous namespaces, and probably isn't even
correct right now for AnyModule matches.
For the same reasons as 6cfac497e9.
This test was added in https://github.com/llvm/llvm-project/pull/100710.
It fails because when we're linking with link.exe, -gdwarf has no
effect and we get a PDB file anyway. The Windows on Arm lldb bot
uses link.exe.
"C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Tools\\MSVC\\14.34.31933\\bin\\Hostx86\\arm64\\link.exe" <...>
08/01/2024 01:47 PM 2,956,488 vla.cpp.ilk
08/01/2024 01:47 PM 6,582,272 vla.cpp.pdb
08/01/2024 01:47 PM 734,208 vla.cpp.tmp
Depends on https://github.com/llvm/llvm-project/pull/100674
Currently, we treat VLAs declared as `int[]` and `int[0]` identically.
I.e., we create them as `IncompleteArrayType`s. However, the
`DW_AT_count` for `int[0]` *does* exist, and is retrievable without an
execution context. This patch decouples the notion of "has 0 elements"
from "has no known `DW_AT_count`".
This aligns with how Clang represents `int[0]` in the AST (it treats it
as a `ConstantArrayType` of 0 size).
This issue was surfaced when adapting LLDB to
https://github.com/llvm/llvm-project/issues/93069. There, the
`__compressed_pair_padding` type has a `char[0]` member. If we
previously got the `__compressed_pair_padding` out of a module (where
clang represents `char[0]` as a `ConstantArrayType`), and try to merge
the AST with one we got from DWARF (where LLDB used to represent
`char[0]` as an `IncompleteArrayType`), the AST structural equivalence
check fails, resulting in silent ASTImporter failures. This manifested
in a failure in `TestNonModuleTypeSeparation.py`.
**Implementation**
1. Adjust `ParseChildArrayInfo` to store the element counts of each VLA
dimension as an `optional<uint64_t>`, instead of a regular `uint64_t`.
So when we pass this down to `CreateArrayType`, we have a better
heuristic for what is an `IncompleteArrayType`.
2. In `TypeSystemClang::GetBitSize`, if we encounter a
`ConstantArrayType` simply return the size that it was created with. If
we couldn't determine the authoritative bound from DWARF during parsing,
we would've created an `IncompleteArrayType`. This ensures that
`GetBitSize` on arrays with `DW_AT_count 0x0` returns `0` (whereas
previously it would've returned a `std::nullopt`, causing that
`FieldDecl` to just get dropped during printing)
The help output incorrectly states that this command takes a shared
library name (<shlib-name>) while really it takes a path to a symbol
file.
rdar://131777043
Use `--match-full-lines` with FileCheck. Otherwise, the checks for `int`
and `unsigned int` will match to the fields inside two `CustomType`s and
FileCheck will start scanning from there, causing string not found
error.
This reverts commit 927def4972.
I've refactored the tests so that we're explicit about whether the
enum is signed or not. Which means we use the proper types
throughout.
Fixes#97514
Given this example:
```
enum E {};
int main()
{
E x = E(0);
E y = E(1);
E z = E(2);
return 0;
}
```
lldb used to print nothing for `x`, but `0x1` for `y` and `0x2` for `z`.
At first this seemed like the 0 case needed fixing but the real issue
here is that en enum with no enumerators was being detected as a
"bitfield like enum".
Which is an enum where all enumerators are a single bit value, or the
sum of previous single bit values.
For these we do not print anything for a value of 0, as we assume it
must be the remainder after we've printed the other bits that were set
(I think this is also unfortunate, but I'm not addressing that here).
Clearly an enum with no enumerators cannot be being used as a bitfield,
so check that up front and print it as if it's a normal enum where we
didn't match any of the enumerators. This means you now get:
```
(lldb) p x
(E) 0
(lldb) p y
(E) 1
(lldb) p z
(E) 2
```
Which is a change to decimal from hex, but I think it's overall more
consistent. Printing hex here was never a concious decision.