Introduce a new %Stdio notification category and use it to send process
output asynchronously when running in non-stop mode. This is an LLDB
extension since GDB does not use the 'O' packet for process output,
just for replies to 'qRcmd' packets.
Using the async notification mechanism implies that only the first
output packet is sent immediately to the client. The client needs
to request subsequent notifications (if any) using the new vStdio packet
(that works pretty much like vStopped for the Stop notification queue).
The packet handler in lldb-server tests is updated to handle the async
stdio packets in addition to the regular O packets. However, due
to the implications noted above, it can only handle the first output
packet sent by the server. Subsequent packets need to be explicitly
requested via vStdio.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.llvm.org/D128849
It turns out that cgroup filtering is relatively trivial and works
really nicely. Thid diffs adds automatic cgroup filtering when in
per-cpu mode, unless a new --disable-cgroup-filtering flag is passed in
the start command. At least on Meta machines, all processes are spawned
inside a cgroup by default, which comes super handy, because per cpu
tracing is now much more precise.
A manual test gave me this result
- Without filtering:
Total number of trace items: 36083
Total number of continuous executions found: 229
Number of continuous executions for this thread: 2
Total number of PSB blocks found: 98
Number of PSB blocks for this thread 2
Total number of unattributed PSB blocks found: 38
- With filtering:
Total number of trace items: 87756
Total number of continuous executions found: 123
Number of continuous executions for this thread: 2
Total number of PSB blocks found: 10
Number of PSB blocks for this thread 3
Total number of unattributed PSB blocks found: 2
Filtering gives us great results. The number of instructions collected
more than double (probalby because we have less noise in the trace), and
we have much less unattributed PSBs blocks and unrelated PSBs in
general. The ones that are unrelated probably belong to other processes
in the same cgroup.
Differential Revision: https://reviews.llvm.org/D129257
LLDB supports having globbing regexes in the process launch arguments
that will be resolved using the user's shell. This requires that we pass
the launch args to the shell and then read back the expanded arguments
using LLDB's argdumper utility.
As the shell will not just expand the globbing regexes but all special
characters, we need to escape all non-globbing charcters such as $, &,
<, >, etc. as those otherwise are interpreted and removed in the step
where we expand the globbing characters. Also because the special
characters are shell-specific, LLDB needs to maintain a list of all the
characters that need to be escaped for each specific shell.
This patch adds the list of special characters that need to be escaped
for fish. Without this patch on systems where fish is the user's shell
having any of these special characters in your arguments or path to
the binary will cause the process launch to fail. E.g., `lldb -- ./calc
1<2` is failing without this patch. The same happens if the absolute
path to calc is in a directory that contains for example parentheses
or other special characters.
Differential revision: https://reviews.llvm.org/D104635
Add a log dump command to dump logs to a file. This only works for
channels that have a log handler associated that supports dumping. For
now that's limited to the circular log handler, but more could be added
in the future.
Differential revision: https://reviews.llvm.org/D128557
Extend vCont function to support resuming a process with an arbitrary
PID, that could be different than the one selected via Hc (or no process
at all may be selected). Resuming more than one process simultaneously
is not supported yet.
Remove the ReadTid() method that was only used by Handle_vCont(),
and furthermore it was wrongly using m_current_process rather than
m_continue_process.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.llvm.org/D127862
Implement the support for the vKill packet. This is the modern packet
used by the GDB Remote Serial Protocol to kill one of the debugged
processes. Unlike the `k` packet, it has well-defined semantics.
The `vKill` packet takes the PID of the process to kill, and always
replies with an `OK` reply (rather than the exit status, as LLGS does
for `k` packets at the moment). Additionally, unlike the `k` packet
it does not cause the connection to be terminated once the last process
is killed — the client needs to close it explicitly.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.llvm.org/D127667
Drop the thread-safe flag and make the locking strategy the
responsibility of the individual log handler.
Previously we got away with a non-thread safe mode because we were using
unbuffered streams that rely on the underlying syscalls/OS for
synchronization. With the introduction of log handlers, we can have
arbitrary logic involved in writing out the logs. With this patch the
log handlers can pick the most appropriate locking strategy for their
particular implementation.
Differential revision: https://reviews.llvm.org/D127922
This patch adds a buffered logging mode to lldb. A buffer size can be
passed to `log enable` with the -b flag. If no buffer size is specified,
logging is unbuffered.
Differential revision: https://reviews.llvm.org/D127986
25c8a061c5 / D127048 added an option
for setting the ABI to GNU.
When an object file is loaded, there's only minimal verification
done for the architecture spec set for it, if the object file only
provides one.
However, for i386 object files, the PECOFF object file plugin
provides two architectures, i386-pc-windows and i686-pc-windows.
This picks a totally different codepath in
TargetList::CreateTargetInternal, where it's treated as a fat
binary. This goes through more verifications to see if the
architectures provided by the object file matches what the
platform plugin supports.
The PlatformWindows() constructor explicitly adds the
"i386-pc-windows" and "i686-pc-windows" architectures (even when
running on other architectures), which allows this "fat binary
verification" to succeed for the i386 object files that provide
two architectures.
However, after that commit, if the object file is advertised with
the different environment (either when lldb is built in a mingw
environment, or if that setting is set), the fat binary validation
won't accept the file any longer.
Update ArchSpec::IsEqualTo with more logic for the Windows use
cases; mismatching vendors is not an issue (they don't have any
practical effect on Windows), and GNU and MSVC environments are
compatible to the point that PlatformWindows can handle object
files for both environments/ABIs.
As a separate path forward, one could also consider to stop returning
two architecture specs from ObjectFilePECOFF::GetModuleSpecifications
for i386 files.
Differential Revision: https://reviews.llvm.org/D128268
Implement the support for %Stop asynchronous notification packet format
in LLGS. This does not implement full support for non-stop mode for
threaded programs -- process plugins continue stopping all threads
on every event. However, it will be used to implement asynchronous
events in multiprocess debugging.
The non-stop protocol is enabled using QNonStop packet. When it is
enabled, the server uses notification protocol instead of regular stop
replies. Since all threads are always stopped, notifications are always
generated for all active threads and copied into stop notification
queue.
If the queue was empty, the initial asynchronous %Stop notification
is sent to the client immediately. The client needs to (eventually)
acknowledge the notification by sending the vStopped packet, in which
case it is popped from the queue and the stop reason for the next thread
is reported. This continues until notification queue is empty again,
in which case an OK reply is sent.
Asychronous notifications are also used for vAttach results and program
exits. The `?` packet uses a hybrid approach -- it returns the first
stop reason synchronously, and exposes the stop reasons for remaining
threads via vStopped queue.
The change includes a test case for a program generating a segfault
on 3 threads. The server is expected to generate a stop notification
for the segfaulting thread, along with the notifications for the other
running threads (with "no stop reason"). This verifies that the stop
reasons are correctly reported for all threads, and that notification
queue works.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.llvm.org/D125575
This patch introduces the concept of a log handlers. Log handlers allow
customizing the way log output is emitted. The StreamCallback class
tried to do something conceptually similar. The benefit of the log
handler interface is that you don't need to conform to llvm's
raw_ostream interface.
Differential revision: https://reviews.llvm.org/D127922
llvm's JSON parser supports 64 bit integers, but other tools like the
ones written in JS don't support numbers that big, so we need to
represent these possibly big numbers as a string. This diff uses that to
represent addresses and tsc zero. The former is printed in hex for and
the latter in decimal string form. The schema was updated mentioning
that.
Besides that, I fixed some remaining issues and now all test pass. Before I wasn't running all tests because for some reason my computer reverted perf_paranoid to 1.
Differential Revision: https://reviews.llvm.org/D127819
As discusses offline with @jj10305, we are updating some naming used throughout the code, specially in the json schema
- traceBuffer -> iptTrace
- core -> cpu
Differential Revision: https://reviews.llvm.org/D127817
This applies the changes requested for diff 12.
- use DenseMap<ConstString, _> instead of std::unordered_map<ConstString, _>, which is more idiomatic and possibly performant.
- deduplicate some code in Trace.cpp by using helper functions for fetching in maps
- stop using size and offset when fetching binary data, because we in fact read the entire buffers all the time. If we ever need streaming, we can implement it then. Now, the size is used only to check that we are getting the correct amount of data. This is useful because in some cases determining the size doesn't involve fetching the actual data.
- added back the x86_64 macro to the perf tests
- added more documentation
- simplified some file handling
- fixed some comments
Differential Revision: https://reviews.llvm.org/D127752
This is the final functional patch to support intel pt decoding per cpu.
It works by doing the following:
- First, all context switches are split by tid and sorted in order. This produces a list of continuous executes per thread per core.
- Then, all intel pt subtraces are split by PSB boundaries and assigned to individual thread continuous executions on the same core by doing simple TSC-based comparisons.
- With this, we have, per thread, a sorted list of continuous executions each one with a list of intel pt subtraces. Up to this point, this is really fast because no instructions were actually decoded.
- Then, each thread can be decoded by traversing their continuous executions and intel pt subtraces. An advantage of having these continuous executions is that we can identify if a continuous exexecution doesn't have intel pt data, and thus has a gap in it. We can later to more sofisticated comparisons to identify if within a continuous execution there are gaps.
I'm adding a test as well.
Differential Revision: https://reviews.llvm.org/D126394
- Add the logic that parses all cpu context switch traces and produces blocks of continuous executions, which will be later used to assign intel pt subtraces to threads and to identify gaps. This logic can also identify if the context switch trace is malformed.
- The continuous executions blocks are able to indicate when there were some contention issues when producing the context switch trace. See the inline comments for more information.
- Update the 'dump info' command to show information and stats related to the multicore decoding flow, including timing about context switch decoding.
- Add the logic to conver nanoseconds to TSCs.
- Fix a bug when returning the context switches. Now they data returned makes sense and even empty traces can be returned from lldb-server.
- Finish the necessary bits for loading and saving a multi-core trace bundle from disk.
- Change some size_t to uint64_t for compatibility with 32 bit systems.
Tested by saving a trace session of a program that sleeps 100 times, it was able to produce the following 'dump info' text:
```
(lldb) trace load /tmp/trace3/trace.json (lldb) thread trace dump info Trace technology: intel-pt
thread #1: tid = 4192415
Total number of instructions: 1
Memory usage:
Total approximate memory usage (excluding raw trace): 2.51 KiB
Average memory usage per instruction (excluding raw trace): 2573.00 bytes
Timing for this thread:
Timing for global tasks:
Context switch trace decoding: 0.00s
Events:
Number of instructions with events: 0
Number of individual events: 0
Multi-core decoding:
Total number of continuous executions found: 2499
Number of continuous executions for this thread: 102
Errors:
Number of TSC decoding errors: 0
```
Differential Revision: https://reviews.llvm.org/D126267
:q!
This diff is massive, but it's because it connects the client with lldb-server
and also ensures that the postmortem case works.
- Flatten the postmortem trace schema. The reason is that the schema has become quite complex due to the new multicore case, which defeats the original purpose of having a schema that could work for every trace plug-in. At this point, it's better that each trace plug-in defines it's own full schema. This means that the only common field is "type".
-- Because of this new approach, I merged the "common" trace load and saving functionalities into the IntelPT one. This simplified the code quite a bit. If we eventually implement another trace plug-in, we can see then what we could reuse.
-- The new schema, which is flattened, has now better comments and is parsed better. A change I did was to disallow hex addresses, because they are a bit error prone. I'm asking now to print the address in decimal.
-- Renamed "intel" to "GenuineIntel" in the schema because that's what you see in /proc/cpuinfo.
- Implemented reading the context switch trace data buffer. I had to do
some refactors to do that cleanly.
-- A major change that I did here was to simplify the perf_event circular buffer reading logic. It was too complex. Maybe the original Intel author had something different in mind.
- Implemented all the necessary bits to read trace.json files with per-core data.
- Implemented all the necessary bits to save to disk per-core trace session.
- Added a test that ensures that parsing and saving to disk works.
Differential Revision: https://reviews.llvm.org/D126015
- Add a warnings field in the jLLDBGetState response, for warnings to be delivered to the client for troubleshooting. This removes the need to silently log lldb-server's llvm::Errors and not expose them easily to the user
- Simplify the tscPerfZeroConversion struct and schema. It used to extend a base abstract class, but I'm doubting that we'll ever add other conversion mechanisms because all modern kernels support perf zero. It is also the one who is supposed to work with the timestamps produced by the context switch trace, so expecting it is imperative.
- Force tsc collection for cpu tracing.
- Add a test checking that tscPerfZeroConversion is returned by the GetState request
- Add a pre-check for cpu tracing that makes sure that perf zero values are available.
Differential Revision: https://reviews.llvm.org/D125932
- Add collection of context switches per cpu grouped with the per-cpu intel pt traces.
- Move the state handling from the interl pt trace class to the PerfEvent one.
- Add support for stopping and enabling perf event groups.
- Return context switch entries as part of the jLLDBTraceGetState response.
- Move the triggers of whenever the process stopped or resumed. Now the will-resume notification is in a better location, which will ensure that we'll capture the instructions that will be executed.
- Remove IntelPTSingleBufferTraceUP. The unique pointer was useless.
- Add unit tests
Differential Revision: https://reviews.llvm.org/D125897
should not receive as exceptions (some will get converted to BSD
signals instead). This is really the only stable way to ensure that
a Mach exception gets converted to it's equivalent BSD signal. For
programs that rely on BSD signal handlers, this has to happen or you
can't even get the program to invoke the signal handler when under
the debugger.
This builds on a previous solution to this problem which required you
start debugserver with the -U flag. This was not very discoverable
and required lldb be the one to launch debugserver, which is not always
the case.
Differential Revision: https://reviews.llvm.org/D125434
This diffs implements per-core tracing on lldb-server. It also includes tests that ensure that tracing can be initiated from the client and that the jLLDBGetState ppacket returns the list of trace buffers per core.
This doesn't include any decoder changes.
Finally, this makes some little changes here and there improving the existing code.
A specific piece of code that can't reliably be tested is when tracing
per core fails due to permissions. In this case we add a
troubleshooting message and this is the manual test:
```
/proc/sys/kernel/perf_event_paranoid set to 1
(lldb) process trace start --per-core-tracing error: perf event syscall failed: Permission denied
You might need that /proc/sys/kernel/perf_event_paranoid has a value of 0 or -1.
``
Differential Revision: https://reviews.llvm.org/D124858
llvm's json parser supports uint64_t, so let's better use it for the
packets being sent between lldb and lldb-server instead of using int64_t
as an intermediate type, which might be error-prone.
llvm's json parser supports uint64_t, so let's better use it for the
packets being sent between lldb and lldb-server instead of using int64_t
as an intermediate type, which might be error-prone.
I'm refactoring IntelPTThreadTrace into IntelPTSingleBufferTrace so that it can
both single threads or single cores. In this diff I'm basically renaming the
class, moving it to its own file, and removing all the pieces that are not used
along with some basic cleanup.
Differential Revision: https://reviews.llvm.org/D124648
This updates the documentation of the gdb-remote protocol, as well as the help messages, to include the new --per-core-tracing option.
Differential Revision: https://reviews.llvm.org/D124640
Currently, ppc64le and ppc64 (defaulting to big endian) have the same
descriptor, thus the linear scan always return ppc64le. Handle that through
subtype.
This is a recommit of f114f00948 with a new test
setup that doesn't involves (unsupported) corefiles.
Differential Revision: https://reviews.llvm.org/D124760
This reverts commit f114f00948.
Due to hitting an assert on our lldb bots:
https://lab.llvm.org/buildbot/#/builders/96/builds/22715
../llvm-project/lldb/source/Plugins/Process/elf-core/ThreadElfCore.cpp:170:
virtual lldb::RegisterContextSP ThreadElfCore::CreateRegisterContextForFrame(
lldb_private::StackFrame *): Assertion `false && "Architecture or OS not supported"' failed.
Currently, ppc64le and ppc64 (defaulting to big endian) have the same
descriptor, thus the linear scan always return ppc64le. Handle that through
subtype.
Differential Revision: https://reviews.llvm.org/D124760
In order to open perf events per core, we need to first get the list of
core ids available in the system. So I'm adding a function that does
that by parsing /proc/cpuinfo. That seems to be the simplest and most
portable way to do that.
Besides that, I made a few refactors and renames to reflect better that
the cpu info that we use in lldb-server comes from procfs.
Differential Revision: https://reviews.llvm.org/D124573
This diff introduces a new symbol on-demand which skips
loading a module's debug info unless explicitly asked on
demand. This provides significant performance improvement
for application with dynamic linking mode which has large
number of modules.
The feature can be turned on with:
"settings set symbols.load-on-demand true"
The feature works by creating a new SymbolFileOnDemand class for
each module which wraps the actual SymbolFIle subclass as member
variable. By default, most virtual methods on SymbolFileOnDemand are
skipped so that it looks like there is no debug info for that module.
But once the module's debug info is explicitly requested to
be enabled (in the conditions mentioned below) SymbolFileOnDemand
will allow all methods to pass through and forward to the actual SymbolFile
which would hydrate module's debug info on-demand.
In an internal benchmark, we are seeing more than 95% improvement
for a 3000 modules application.
Currently we are providing several ways to on demand hydrate
a module's debug info:
* Source line breakpoint: matching in supported files
* Stack trace: resolving symbol context for an address
* Symbolic breakpoint: symbol table match guided promotion
* Global variable: symbol table match guided promotion
In all above situations the module's debug info will be on-demand
parsed and indexed.
Some follow-ups for this feature:
* Add a command that allows users to load debug info explicitly while using a
new or existing command when this feature is enabled
* Add settings for "never load any of these executables in Symbols On Demand"
that takes a list of globs
* Add settings for "always load the the debug info for executables in Symbols
On Demand" that takes a list of globs
* Add a new column in "image list" that shows up by default when Symbols On
Demand is enable to show the status for each shlib like "not enabled for
this", "debug info off" and "debug info on" (with a single character to
short string, not the ones I just typed)
Differential Revision: https://reviews.llvm.org/D121631
Unlike for any of the other shells, we were escaping $ when using tcsh.
There's nothing special about $ in tcsh and this prevents you from
expanding shell variables, one of the main reasons this functionality
exists in the first place.
Differential revision: https://reviews.llvm.org/D123690
LLDB supports having globbing regexes in the process launch arguments
that will be resolved using the user's shell. This requires that we pass
the launch args to the shell and then read back the expanded arguments
using LLDB's argdumper utility.
As the shell will not just expand the globbing regexes but all special
characters, we need to escape all non-globbing charcters such as $, &,
<, >, etc. as those otherwise are interpreted and removed in the step
where we expand the globbing characters. Also because the special
characters are shell-specific, LLDB needs to maintain a list of all the
characters that need to be escaped for each specific shell.
This patch adds the missing semicolon character to the escape list for
all currently supported shells. Without this having a semicolon in the
binary path or having a semicolon in the launch arguments will cause the
argdumping process to fail. E.g., lldb -- ./calc "a;b" was failing
before but is working now.
Fixes rdar://55776943
Differential revision: https://reviews.llvm.org/D104629
Currently, all data buffers are assumed to be writable. This is a
problem on macOS where it's not allowed to load unsigned binaries in
memory as writable. To be more precise, MAP_RESILIENT_CODESIGN and
MAP_RESILIENT_MEDIA need to be set for mapped (unsigned) binaries on our
platform.
Binaries are mapped through FileSystem::CreateDataBuffer which returns a
DataBufferLLVM. The latter is backed by a llvm::WritableMemoryBuffer
because every DataBuffer in LLDB is considered to be writable. In order
to use a read-only llvm::MemoryBuffer I had to split our abstraction
around it.
This patch distinguishes between a DataBuffer (read-only) and
WritableDataBuffer (read-write) and updates LLDB to use the appropriate
one.
rdar://74890607
Differential revision: https://reviews.llvm.org/D122856
I found this function somewhat hard to read and removed a few entirely
redundant checks and converted it to early exits.
Differential Revision: https://reviews.llvm.org/D122912
Applied modernize-use-equals-default clang-tidy check over LLDB.
This check is already present in the lldb/.clang-tidy config.
Differential Revision: https://reviews.llvm.org/D121844
Update the response schema of the TraceGetState packet and add
Intel PT specific response structure that contains the TSC conversion,
if it exists. The IntelPTCollector loads the TSC conversion and caches
it to prevent unnecessary calls to perf_event_open. Move the TSC conversion
calculation from Perf.h to TraceIntelPTGDBRemotePackets.h to remove
dependency on Linux specific headers.
Differential Revision: https://reviews.llvm.org/D122246
Applied modernize-use-default-member-init clang-tidy check over LLDB.
It appears in many files we had already switched to in class member init but
never updated the constructors to reflect that. This check is already present in
the lldb/.clang-tidy config.
Differential Revision: https://reviews.llvm.org/D121481