Removes the debugger broadcast bits from `Debugger.h` and instead uses
the enum from `lldb-enumerations.h` and adds the
`eBroadcastSymbolChange` bit to the enum in `lldb-enumerations.h`. This fixes a bug wherein the incorrect broadcast bit could be referenced due both of these enums previously existing and being out-of-sync with each other.
Adds a `show_function_display_name` parameter to
`SymbolContext::DumpStopContext`. This
parameter defaults to false, but `BreakpointLocation::GetDescription`
sets it to true.
This is NFC in mainline lldb, and will be used to modify how Swift
breakpoint locations are printed.
Give language plugins the opportunity to provide a language specific
display name.
This will be used in a follow up commit. The purpose of this change is
to ultimately display breakpoint locations with a more human friendly
demangling of Swift symbols.
Adds support for applying LLVM formatting to variables.
The reason for this is to support cases such as the following.
Let's say you have two separate bytes that you want to print as a
combined hex value. Consider the following summary string:
```
${var.byte1%x}${var.byte2%x}
```
The output of this will be: `0x120x34`. That is, a `0x` prefix is
unconditionally applied to each byte. This is unlike printf formatting
where you must include the `0x` yourself.
Currently, there's no way to do this with summary strings, instead
you'll need a summary provider in python or c++.
This change introduces formatting support using LLVM's formatter system.
This allows users to achieve the desired custom formatting using:
```
${var.byte1:x-}${var.byte2:x-}
```
Here, each variable is suffixed with `:x-`. This is passed to the LLVM
formatter as `{0:x-}`. For integer values, `x` declares the output as
hex, and `-` declares that no `0x` prefix is to be used. Further, one
could write:
```
${var.byte1:x-2}${var.byte2:x-2}
```
Where the added `2` results in these bytes being written with a minimum
of 2 digits.
An alternative considered was to add a new format specifier that would
print hex values without the `0x` prefix. The reason that approach was
not taken is because in addition to forcing a `0x` prefix, hex values
are also forced to use leading zeros. This approach lets the user have
full control over formatting.
These are hardcoded strings that are already present in the data section
of the binary, no need to immediately place them in the ConstString
StringPools. Lots of code still calls `GetBroadcasterClass` and places
the return value into a ConstString. Changing that would be a good
follow-up.
Additionally, calls to these functions are still wrapped in ConstStrings
at the SBAPI layer. This is because we must guarantee the lifetime of
all strings handed out publicly.
This member variable is completely unused. I also don't think it makes a
ton of sense since (1) The "base address" can be obtained from the first
Instruction in its InstructionList, and (2) InstructionLists may not be
a series of contiguous instructions (even though they are most of the
time).
In
commit 2f63718f85
Author: Jason Molenda <jmolenda@apple.com>
Date: Tue Mar 26 09:07:15 2024 -0700
[lldb] Don't clear a Module's UnwindTable when adding a SymbolFile
(#86603)
I stopped clearing a Module's UnwindTable when we add a SymbolFile to
avoid the memory management problems with adding a symbol file
asynchronously while the UnwindTable is being accessed on another
thread. This broke the target-symbols-add-unwind.test shell test on
Linux which removes the DWARF debub_frame section from a binary, loads
it, then loads the unstripped binary with the DWARF debug_frame section
and checks that the UnwindPlans for a function include debug_frame.
I originally decided that I was willing to sacrifice the possiblity of
additional unwind sources from a symbol file because we rely on assembly
emulation so heavily, they're rarely critical. But there are targets
where we we don't have emluation and rely on things like DWARF
debug_frame a lot more, so this probably wasn't a good choice.
This patch adds a new UnwindTable::Update method which looks for any new
sources of unwind information and adds it to the UnwindTable, and calls
that after a new SymbolFile has been added to a Module.
This implements coalescing of progress events using a timeout, as
discussed in the RFC on Discourse [1]. This PR consists of two commits
which, depending on the feedback, I may split up into two PRs. For now,
I think it's easier to review this as a whole.
1. The first commit introduces a new generic `Alarm` class. The class
lets you to schedule a function (callback) to be executed after a given
timeout expires. You can cancel and reset a callback before its
corresponding timeout expires. It achieves this with the help of a
worker thread that sleeps until the next timeout expires. The only
guarantee it provides is that your function is called no sooner than the
requested timeout. Because the callback is called directly from the
worker thread, a long running callback could potentially block the
worker thread. I intentionally kept the implementation as simple as
possible while addressing the needs for the `ProgressManager` use case.
If we want to rely on this somewhere else, we can reassess whether we
need to address those limitations.
2. The second commit uses the Alarm class to coalesce progress events.
To recap the Discourse discussion, when multiple progress events with
the same title execute in close succession, they get broadcast as one to
`eBroadcastBitProgressCategory`. The `ProgressManager` keeps track of
the in-flight progress events and when the refcount hits zero, the Alarm
class is used to schedule broadcasting the event. If a new progress
event comes in before the alarm fires, the alarm is reset (and the
process repeats when the new progress event ends). If no new event comes
in before the timeout expires, the progress event is broadcast.
[1]
https://discourse.llvm.org/t/rfc-improve-lldb-progress-reporting/75717/
Fixing a crash in lldb when `symbols.auto-download` setting is enabled.
When doing a backtrace, this feature has lldb search for a SymbolFile
for stack frames when we are backtracing, and add them either
synchoronously or asynchronously, depending on the specific setting
used.
Module::SetSymbolFileFileSpec clears the Module's UnwindTable, once we
find a new SymbolFile. We may be adding a source of unwind information
that we did not have when lldb was working only with the executable
binary.
What happens in practice is that we're using a reference to the Module's
UnwindTable, and then the other thread getting the SymbolFile clears it
and now the first thread is referring to freed memory and we can crash.
When built with address sanitizer, it crashes much more reliably.
Given that unwind information used for exception handling -- eh_frame,
compact unwind -- is present in executable binaries, the only thing
we're likely to *add* would be DWARF's `debug_frame` if that was also
available. The actual value of re-creating the UnwindTable when we have
added a SymbolFile is not large.
I also tried fixing this by changing the Module to have a shared_ptr to
the UnwindTable, so we could have two different UnwindTable's in use
simultaneously for a brief period. This would be fine TODAY, but it
introduces a very subtle bug that someone will have a heck of a time
figuring out in the future.
In the end, I believe the safest approach is to sacrifice the possible
marginal gain of reconstructing the UnwindTable once a SymbolFile has
been added, to sidestep this whole problem area.
Also, in `Module::GetUnwindTable()`, call `DownloadSymbolFileAsync`
before we create the UnwindTable for the first time, in case the symbol
file is fetched synchronously, we will have it for that possible
marginal gain.
This implements coalescing of progress events using a timeout, as
discussed in the RFC on Discourse [1]. This PR consists of two commits
which, depending on the feedback, I may split up into two PRs. For now,
I think it's easier to review this as a whole.
1. The first commit introduces a new generic `Alarm` class. The class
lets you to schedule a function (callback) to be executed after a given
timeout expires. You can cancel and reset a callback before its
corresponding timeout expires. It achieves this with the help of a
worker thread that sleeps until the next timeout expires. The only
guarantee it provides is that your function is called no sooner than the
requested timeout. Because the callback is called directly from the
worker thread, a long running callback could potentially block the
worker thread. I intentionally kept the implementation as simple as
possible while addressing the needs for the `ProgressManager` use case.
If we want to rely on this somewhere else, we can reassess whether we
need to address those limitations.
2. The second commit uses the Alarm class to coalesce progress events.
To recap the Discourse discussion, when multiple progress events with
the same title execute in close succession, they get broadcast as one to
`eBroadcastBitProgressCategory`. The `ProgressManager` keeps track of
the in-flight progress events and when the refcount hits zero, the Alarm
class is used to schedule broadcasting the event. If a new progress
event comes in before the alarm fires, the alarm is reset (and the
process repeats when the new progress event ends). If no new event comes
in before the timeout expires, the progress event is broadcast.
[1]
https://discourse.llvm.org/t/rfc-improve-lldb-progress-reporting/75717/
This is another step towards supporting DWARF5 checksums and inline
source code in LLDB. This is a reland of #85468 but without the
functional change of storing the support file from the line table (yet).
Some languages may create artificial functions that have no real user
code, even though there is line table information for them. One such
case is with coroutine code that receives the CoroSplitter
transformation in LLVM IR. That code transformation creates many
different Functions, cloning one Instruction into many Instructions in
many different Functions and copying the associated debug locations.
It would be difficult to make that pass delete debug locations of cloned
instructions in a language agnostic way (is it even possible?), but LLDB
can ignore certain locations by querying its Language APIs and having it
decide based on, for example, mangling information.
MSVC fails when there is ambiguity (multiple options) around implicit
type conversion operators.
Make ConstString's conversion operator to string_view explicit to avoid
ambiguity with one to StringRef and remove an unused local variable that
MSVC also fails on.
The ValueObjectConstResult classes that back expression result variables
play a complicated game with where the data for their values is stored.
They try to make it appear as though they are still tied to the memory
in the target into which their value was written when the expression is
run, but they also keep a copy in the Host which they use after the
value is made (expression results are "history values" so that's how we
make sure they have "the value at the time of the expression".)
However, that means that if you ask them to cast themselves to a value
bigger than their original size, they don't have a way to get more
memory for that purpose. The same thing is true of ValueObjects backed
by DataExtractors, the data extractors don't know how to get more data
than they were made with in general.
The only place where we actually ask ValueObjects to sample outside
their captured bounds is when you do ValueObject::Cast from one
structure type to a bigger structure type. In
https://reviews.llvm.org/D153657 I handled this by just disallowing
casts from one structure value to a larger one. My reasoning at the time
was that the use case for this was to support discriminator based C
inheritance schemes, and you can't directly cast values in C, only
pointers, so this was not a natural way to handle those types. It seemed
logical that since you would have had to start with pointers in the
implementation, that's how you would write your lldb introspection code
as well.
Famous last words...
Turns out there are some heavy users of the SB API's who were relying on
this working, and this is a behavior change, so this patch makes this
work in the cases where it used to work before, while still disallowing
the cases we don't know how to support.
Note that if you had done this Cast operation before with either
expression results or value objects from data extractors, lldb would not
have returned the correct results, so the cases this patch outlaws are
ones that actually produce invalid results. So nobody should be using
Cast in these cases, or if they were, this patch will point out the bug
they hadn't yet noticed.
Walter Erquinigo added optional instruction annotations for x86
instructions in 2022 for the `thread trace dump instruction` command,
and code to DisassemblerLLVMC to add annotations for instructions that
change flow control, v. https://reviews.llvm.org/D128477
This was added as an option to `disassemble`, and the trace dump command
enables it by default, but several other instruction dumpers were
changed to display them by default as well. These are only implemented
for Intel instructions, so our disassembly on other targets ends up
looking like
```
(lldb) x/5i 0x1000086e4
0x1000086e4: 0xa9be6ffc unknown stp x28, x27, [sp, #-0x20]!
0x1000086e8: 0xa9017bfd unknown stp x29, x30, [sp, #0x10]
0x1000086ec: 0x910043fd unknown add x29, sp, #0x10
0x1000086f0: 0xd11843ff unknown sub sp, sp, #0x610
0x1000086f4: 0x910c63e8 unknown add x8, sp, #0x318
```
instead of `disassemble`'s output style of
```
lldb`main:
lldb[0x1000086e4] <+0>: stp x28, x27, [sp, #-0x20]!
lldb[0x1000086e8] <+4>: stp x29, x30, [sp, #0x10]
lldb[0x1000086ec] <+8>: add x29, sp, #0x10
lldb[0x1000086f0] <+12>: sub sp, sp, #0x610
lldb[0x1000086f4] <+16>: add x8, sp, #0x318
```
Adding symbolic annotations for assembly instructions is something I'm
interested in too, because we may have users investigating a crash or
apparent-incorrect behavior who must debug optimized assembly and they
may not be familiar with the ISA they're using, so short of flipping
through a many-thousand-page PDF to understand each instruction, they're
lost. They don't write assembly or work at that level, but to understand
a bug, they have to understand what the instructions are actually doing.
But the annotations that exist today don't move us forward much on that
front - I'd argue that the flow control instructions on Intel are not
hard to understand from their names, but that might just be my personal
bias. Much trickier instructions exist in any event.
Displaying this information by default for all targets when we only have
one class of instructions on one target is not a good default.
Also, in 2011 when Greg implemented the `memory read -f i` (aka `x/i`)
command
```
commit 5009f9d501
Author: Greg Clayton <gclayton@apple.com>
Date: Thu Oct 27 17:55:14 2011 +0000
[...]
eFormatInstruction will print out disassembly with bytes and it will use the
current target's architecture. The format character for this is "i" (which
used to be being used for the integer format, but the integer format also has
"d", so we gave the "i" format to disassembly), the long format is
"instruction".
```
he had DumpDataExtractor's DumpInstructions print the bytes of the
instruction -- that's the first field we see above for the `x/5i` after
the address -- and this is only useful for people who are debugging the
disassembler itself, I would argue. I don't want this displayed by
default either.
tl;dr this patch removes both fields from `memory read -f -i` and I
think this is the right call today. While I'm really interested in
instruction annotation, I don't think `x/i` is the right place to have
it enabled by default unless it's really compelling on at least some of
our major targets.
Change GetNumChildren()/CalculateNumChildren() methods return
llvm::Expected
This is an NFC change that does not yet add any error handling or change
any code to return any errors.
This is the second big change in the patch series started with
https://github.com/llvm/llvm-project/pull/83501
A follow-up PR will wire up error handling.
Change GetNumChildren()/CalculateNumChildren() methods return
llvm::Expected
This is an NFC change that does not yet add any error handling or change
any code to return any errors.
This is the second big change in the patch series started with
https://github.com/llvm/llvm-project/pull/83501
A follow-up PR will wire up error handling.
Currently, progress events reported by the ProgressManager and broadcast
to eBroadcastBitProgressCategory always specify they're complete. The
problem is that the ProgressManager reports kNonDeterministicTotal for
both the total and the completed number of (sub)events. Because the
values are the same, the event reports itself as complete.
This patch fixes the issue by reporting 0 as the completed value for the
start event and kNonDeterministicTotal for the end event.
llvm-project/lldb/source/Core/Debugger.cpp:107:14:
error: no type named 'DefaultThreadPoolThreadPool' in namespace 'llvm'
static llvm::DefaultThreadPoolThreadPool *g_thread_pool = nullptr;
~~~~~~^
1 error generated.
The base class llvm::ThreadPoolInterface will be renamed
llvm::ThreadPool in a subsequent commit.
This is a breaking change: clients who use to create a ThreadPool must
now create a DefaultThreadPool instead.
This commit adds the functionality to broadcast events using the
`Debugger::eBroadcastProgressCategory`
bit (https://github.com/llvm/llvm-project/pull/81169) by keeping track
of these reports with the `ProgressManager`
class (https://github.com/llvm/llvm-project/pull/81319). The new bit is
used in such a way that it will only broadcast the initial and final
progress reports for specific progress categories that are managed by
the progress manager.
This commit also adds a new test to the progress report unit test that
checks that only the initial/final reports are broadcasted when using
the new bit.
afd469023a fixed the type of the term-width setting but the getter
(Debugger::GetTerminalWidth) was still trying to get the terminal width
as an unsigned. This fixes TestXMLRegisterFlags.py.
I noticed that the term-width setting would always report its default
value (80) despite the driver correctly setting the value with
SBDebugger::SetTerminalWidth.
```
(lldb) settings show term-width
term-width (int) = 80
```
The issue is that the setting was defined as a SInt64 instead of a
UInt64 while the getter returned an unsigned value. There's no reason
the terminal width should be a signed value. My best guess it that it
was using SInt64 because UInt64 didn't support min and max values. I
fixed that and correct the type and now lldb reports the correct
terminal width:
```
(lldb) settings show term-width
term-width (unsigned) = 189
```
rdar://123488999
Per discussions from https://github.com/llvm/llvm-project/pull/81026, it
was decided that having a class that manages a map of progress reports
would be beneficial in order to categorize them. This class is a part of
the overall `Progress` class and utilizes a map that keeps a count of
how many times a progress report category has been sent. This would be
used with the new debugger broadcast bit added in
https://github.com/llvm/llvm-project/pull/81169 so that a user listening
with that bit will receive grouped progress reports.
Adds a comment to indicate intention of a piece of value printing code.
I was initially surprised to see this code (distilled for emphasis):
```cpp
if (str.empty()) {
if (style == eValueObjectRepresentationStyleValue)
str = GetSummaryAsCString();
else if (style == eValueObjectRepresentationStyleSummary)
str = GetValueAsCString();
}
```
My first thought was "is this a bug?", but I realized it was likely intentional. This
change adds a comment to indicate yes, this is intentional.
I get a small but fairly steady stream of crash reports which I can only
explain by ValueObjectPrinter trying to access its m_valobj field, and
finding it NULL. I have never been able to reproduce any of these, and
the reports show a state too long after the fact to know what went
wrong.
I've read through this section of lldb a bunch of times trying to figure
out how this could happen, but haven't ever found anything actually
wrong that could cause this. OTOH, ValueObjectPrinter is somewhat sloppy
about how it handles the ValueObject it is printing.
a) lldb allows you to make a ValueObjectPrinter with a Null incoming
ValueObject. However, there's no affordance to set the ValueObject in
the Printer after the fact, and it doesn't really make sense to do that.
So I change the ValueObjectPrinter API's to take a ValueObject
reference, rather than a pointer. All the places that make
ValueObjectPrinters already check the non-null status of their
ValueObject's before making the ValueObjectPrinter, so sadly, I didn't
find the bug, but this will enforce the intent.
b) The next step in printing the ValueObject is deciding which of the
associated DynamicValue/SyntheticValue we are actually printing (based
on the use_dynamic and use_synthetic settings in the original
ValueObject. This was put in a pointer by GetMostSpecializedValue, but
most of the printer code just accessed the pointer, and it was hard to
reason out whether we were guaranteed to always call this before using
m_valobj. So far as I could see we always do (sigh, didn't find the bug
there either) but this was way too hard to reason about.
In fact, we figure out once which ValueObject we're going to print and
don't change that through the life of the printer. So I changed this to
both set the "most specialized value" in the constructor, and then to
always access it through GetMostSpecializedValue(). That makes it easier
to reason about the use of this ValueObject as well.
This is an NFC change, all it does is make the code easier to reason
about.
LLDB has a setting (symbols.enable-background-lookup) that calls
dsymForUUID on a background thread for images as they appear in the
current backtrace. Originally, the laziness of only looking up symbols
for images in the backtrace only existed to bring the number of
dsymForUUID calls down to a manageable number.
Users have requesting the same functionality but blocking. This gives
them the same user experience as enabling dsymForUUID globally, but
without the massive upfront cost of having to download all the images,
the majority of which they'll likely not need.
This patch renames the setting to have a more generic name
(symbols.auto-download) and changes its values from a boolean to an
enum. Users can now specify "off", "background" and "foreground". The
default remains "off" although I'll probably change that in the near
future.
LLDB has a setting (symbols.enable-background-lookup) that calls
dsymForUUID on a background thread for images as they appear in the
current backtrace. Originally, the laziness of only looking up symbols
for images in the backtrace only existed to bring the number of
dsymForUUID calls down to a manageable number.
Users have requesting the same functionality but blocking. This gives
them the same user experience as enabling dsymForUUID globally, but
without the massive upfront cost of having to download all the images,
the majority of which they'll likely not need.
This patch renames the setting to have a more generic name
(symbols.auto-download) and changes its values from a boolean to an
enum. Users can now specify "off", "background" and "foreground". The
default remains "off" although I'll probably change that in the near
future.
Refactors logic in `ParseInternal` that was previously calling
`GetFormatFromCString` twice, once with `partial_match_ok` set to false,
and the second time set to true.
With this change, lldb formats (ie `%@`, `%S`, etc) are checked first.
If a format is not one of those, then `GetFormatFromCString` is called
once, and now always checks for partial matches.
The `total` parameter for the constructor for Progress was changed to a
std::optional in https://github.com/llvm/llvm-project/pull/77547. It was originally set to 1 to indicate non-determinisitic progress, but this commit changes this. First, `UINT64_MAX` will again be used for non-deterministic progress, and `Progress` now has a static variable set to this value so that we can use this instead of a magic number.
The member variable `m_total` could be changed to a std::optional as
well, but this means that the `ProgressEventData::GetTotal()` (which is
used for the public API) would
either need to return a std::optional value or it would return some
specific integer to represent non-deterministic progress if `m_total` is
std::nullopt.
This reverts commit d657519838 as it
breaks two dozen tests. The breakages are related to variable path
expression parsing and summary string parsing (possibly the same code).
As I worked through changes to another PR
(https://github.com/llvm/llvm-project/pull/74912), I couldn't help but
rewrite a few methods for readability, maintainability, and possibly
some behavior correctness too.
1. Exiting early instead of nested `if`-statements, which:
- Reduces indentation levels for all subsequent lines
- Treats missing pre-conditions similar to an error
- Clearly indicates that the full length of the method is the "happy
path".
2. Explicitly return empty Value Object shared pointers for those error
(like) situations, which
- Reduces the time it takes a maintainer to figure out what the method
actually returns based on those conditions.
3. Converting a mix of `if` and `if`-`else`-statements around an enum
into one `switch` statement, which:
- Consolidates the former branching logic
- Lets the compiler warn you of a (future) missing enum case
- This one may actually change behavior slightly, because what was an
early test for one enum case, now happens later on in the `switch`.
4. Consolidating near-identical, "copy-pasta" logic into one place,
which:
- Separates the common code to the diverging paths.
- Highlights the differences between the code paths.
rdar://119833526
Follow-up to #69422.
This PR puts all the highlighting settings into a single struct for
easier handling
Co-authored-by: Talha Tahir <talha.tahir@10xengineers.ai>