`llvm::convertUTF16ToUTF8String` opens with an assertion that the output
container is empty:
3bdf9a0880/llvm/lib/Support/ConvertUTFWrapper.cpp (L83-L84)
It's not clear to me why this function requires the output container to
be empty instead of just overwriting it, but the callsite in
`TargetThreadWindows::GetName` may reuse the container without clearing
it out first, resulting in an assertion failure:
```
# Child-SP RetAddr Call Site
00 000000d2`44b8ea48 00007ff8`beefc12e ntdll!NtTerminateProcess+0x14
01 000000d2`44b8ea50 00007ff8`bcf518ab ntdll!RtlExitUserProcess+0x11e
02 000000d2`44b8ea80 00007ff8`bc0e0143 KERNEL32!ExitProcessImplementation+0xb
03 000000d2`44b8eab0 00007ff8`bc0e4c49 ucrtbase!common_exit+0xc7
04 000000d2`44b8eb10 00007ff8`bc102ae6 ucrtbase!abort+0x69
05 000000d2`44b8eb40 00007ff8`bc102cc1 ucrtbase!common_assert_to_stderr<wchar_t>+0x6e
06 000000d2`44b8eb80 00007fff`b8e27a80 ucrtbase!wassert+0x71
07 000000d2`44b8ebb0 00007fff`b8b821e1 liblldb!llvm::convertUTF16ToUTF8String+0x30 [D:\r\_work\swift-build\swift-build\SourceCache\llvm-project\llvm\lib\Support\ConvertUTFWrapper.cpp @ 88]
08 000000d2`44b8ec30 00007fff`b83e9aa2 liblldb!lldb_private::TargetThreadWindows::GetName+0x1b1 [D:\r\_work\swift-build\swift-build\SourceCache\llvm-project\lldb\source\Plugins\Process\Windows\Common\TargetThreadWindows.cpp @ 198]
09 000000d2`44b8eca0 00007ff7`2a3c3c14 liblldb!lldb::SBThread::GetName+0x102 [D:\r\_work\swift-build\swift-build\SourceCache\llvm-project\lldb\source\API\SBThread.cpp @ 432]
0a 000000d2`44b8ed70 00007ff7`2a3a5ac6 lldb_dap!lldb_dap::CreateThread+0x1f4 [S:\SourceCache\llvm-project\lldb\tools\lldb-dap\JSONUtils.cpp @ 877]
0b 000000d2`44b8ef10 00007ff7`2a3b0ab5 lldb_dap!`anonymous namespace'::request_threads+0xa6 [S:\SourceCache\llvm-project\lldb\tools\lldb-dap\lldb-dap.cpp @ 3906]
0c 000000d2`44b8f010 00007ff7`2a3b0fe8 lldb_dap!lldb_dap::DAP::HandleObject+0x1c5 [S:\SourceCache\llvm-project\lldb\tools\lldb-dap\DAP.cpp @ 796]
0d 000000d2`44b8f130 00007ff7`2a3a8b96 lldb_dap!lldb_dap::DAP::Loop+0x78 [S:\SourceCache\llvm-project\lldb\tools\lldb-dap\DAP.cpp @ 812]
0e 000000d2`44b8f1d0 00007ff7`2a4b5fbc lldb_dap!main+0x1096 [S:\SourceCache\llvm-project\lldb\tools\lldb-dap\lldb-dap.cpp @ 5319]
0f (Inline Function) --------`-------- lldb_dap!invoke_main+0x22 [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 78]
10 000000d2`44b8fb80 00007ff8`bcf3e8d7 lldb_dap!__scrt_common_main_seh+0x10c [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 288]
11 000000d2`44b8fbc0 00007ff8`beefbf6c KERNEL32!BaseThreadInitThunk+0x17
12 000000d2`44b8fbf0 00000000`00000000 ntdll!RtlUserThreadStart+0x2c
```
This stack trace was captured from the lldb distributed in the Swift
toolchain. The issue is easy to reproduce by resuming from a breakpoint
twice in VS Code.
I've verified that clearing out the container here fixes the assertion
failure.
This commit makes ThreadMemory a real "forwarder" class by implementing
the missing queue methods: they will just call the corresponding backing
thread method.
To make this patch NFC(*) and not change the behavior of the Python OS
plugin, NamedThreadMemoryWithQueue also overrides these methods to
simply call the `Thread` method, just as it was doing before. This also
makes it obvious that there are missing pieces of this class if it were
to provide full queue support.
(*) This patch is NFC in the sense that all llvm.org plugins will not
have any behavior change, but downstream consumers of ThreadMemory will
benefit from the newly implemented forwarding methods.
ThreadMemory attempts to be a class abstracting the notion of a "fake"
Thread that is backed by a "real" thread. According to its
documentation, it is meant to be a class forwarding most methods to the
backing thread, but it does so only for a handful of methods.
Along the way, it also tries to represent a Thread that may or may not
have a different name, and may or may not have a different queue from
the backing thread. This can be problematic for a couple of reasons:
1. It forces all users into this optional behavior.
2. The forwarding behavior is incomplete: not all methods are currently
being forwarded properly. Some of them involve queues and seem to have
been intentionally left unimplemented.
This commit creates the following separation:
ThreadMemory <- ThreadMemoryProvidingName <-
ThreadMemoryProvidingNameAndQueue
ThreadMemory captures the notion of a backed thread that _really_
forwards all methods to the backing thread. (Missing methods should be
implemented in a later commit, and allowing them to be implemented
without changing behavior of other derived classes is the main purpose
of this refactor).
ThreadMemoryProvidingNameAndQueue is a ThreadMemory that allows users to
override the thread name. If a name is present, it is used; otherwise
the forwarding behavior is used.
ThreadMemoryProvidingNameAndQueue is a ThreadMemoryProvidingName that
allows users to override queue information. If queue information is
present, it is used; otherwise, the forwarding behavior is used.
With this separation, we can more explicitly implement missing methods
of the base class and override them, if needed, in
ThreadMemoryProvidingNameAndQueue. But this commit really is NFC, no new
methods are implemented and no method implementation is changed.
This reverts commit
87b7f63a11,
reapplying
7e66cf74fb
with a small (and probably temporary)
change to generate more debug info to help with diagnosing buildbot
issues.
The main motivation for this was the inconsistency in handling of
partial reads/writes between the windows and posix implementations
(windows was returning partial reads, posix was trying to fill the
buffer completely). I settle on the windows implementation, as that's
the more common behavior, and the "eager" version can be implemented on
top of that (in most cases, it isn't necessary, since we're writing just
a single byte).
Since this also required auditing the callers to make sure they're
handling partial reads/writes correctly, I used the opportunity to
modernize the function signatures as a forcing function. They now use
the `Timeout` class (basically an `optional<duration>`) to support both
polls (timeout=0) and blocking (timeout=nullopt) operations in a single
function, and use an `Expected` instead of a by-ref result to return the
number of bytes read/written.
As a drive-by, I also fix a problem with the windows implementation
where we were rounding the timeout value down, which meant that calls
could time out slightly sooner than expected.
Current state in scripted process expects [all the
modules](912b154f3a/lldb/source/Plugins/Process/scripted/ScriptedProcess.cpp (L498))
passed into "get_loaded_images" to load successfully else none of them
load. Even if a module loads fine, [but has already been
appended](912b154f3a/lldb/source/Plugins/Process/scripted/ScriptedProcess.cpp (L495))
it still fails. This is restrictive and does not help our usecase.
**Usecase**: We have a parent scripted process using coredump +
tombstone.
1) Scripted process uses child elf-core process to read memory dump
2) Uses tombstones to pass thread names and modules.
We do not know whether the modules will be successfully downloaded
before creating the scripted process. We use [python module
callbacks](a57e58dbfa/lldb/source/Target/Platform.cpp (L1593))
to download a module from symbol server at LLDB load time when the
scripted process is being created. The issue is that if one of the
symbol is not found from the list specified in tombstone, none of the
modules load in scripted process. Even if we ensure symbols are present
in symbol server before creating the scripted process, if the load
address is wrong or if the module is already appended, all module loads
are skipped.
**Solution**: Pass in a custom boolean option arg for every module from
python scripted process plugin which will indicate whether to ignore the
module load error. This will provide the flexibility to user for loading
the successfully fetched modules into target while ignoring the failed
ones
---------
Co-authored-by: rchamala <rachamal@fb.com>
Make StreamAsynchronousIO an unique_ptr instead of a shared_ptr. I tried
passing the class by value, but the llvm::raw_ostream forwarder stored
in the Stream parent class isn't movable and I don't think it's worth
changing that. Additionally, there's a few places that expect a
StreamSP, which are easily created from a StreamUP.
Mach-O corefiles have LC_NOTE metadata, one LC_NOTE that lldb recognizes
is `main bin spec` which can specify that this is a kernel corefile,
userland corefile, or firmware/standalone corefile. With a userland
corefile, the LC_NOTE would specify the virtual address of the dyld
binary's Mach-O header. lldb would create a Module from that in-memory
binary, find the `dyld_all_image_infos` object in dyld's DATA segment,
and use that object to find all of the binaries present in the corefile.
ProcessMachCore takes the metadata from this LC_NOTE and passes the
address to the DynamicLoader plugin via its `GetImageInfoAddress()`
method, so the DynamicLoader can find all of the binaries and load them
in the Target at their correct virtual addresses.
We have a corefile creator who would prefer to specify the address of
`dyld_all_image_infos` directly, instead of specifying the address of
dyld and parsing that to find the object. DynamicLoaderMacOSX, the
DynamicLoader plugin being used here, will accept either a dyld virtual
address or a `dyld_all_image_infos` virtual address from
ProcessMachCore, and do the correct thing with either value.
lldb's process save-core mach-o corefile reader will continue to specify
the virtual address of the dyld binary.
rdar://144322688
lldb today has two rules: When a thread stops at a BreakpointSite, we
set the thread's StopReason to be "breakpoint hit" (regardless if we've
actually hit the breakpoint, or if we've merely stopped *at* the
breakpoint instruction/point and haven't tripped it yet). And second,
when resuming a process, any thread sitting at a BreakpointSite is
silently stepped over the BreakpointSite -- because we've already
flagged the breakpoint hit when we stopped there originally.
In this patch, I change lldb to only set a thread's stop reason to
breakpoint-hit when we've actually executed the instruction/triggered
the breakpoint. When we resume, we only silently step past a
BreakpointSite that we've registered as hit. We preserve this state
across inferior function calls that the user may do while stopped, etc.
Also, when a user adds a new breakpoint at $pc while stopped, or changes
$pc to be the address of a BreakpointSite, we will silently step past
that breakpoint when the process resumes. This is purely a UX call, I
don't think there's any person who wants to set a breakpoint at $pc and
then hit it immediately on resuming.
One non-intuitive UX from this change, butt is necessary: If you're
stopped at a BreakpointSite that has not yet executed, you `stepi`, you
will hit the breakpoint and the pc will not yet advance. This thread has
not completed its stepi, and the ThreadPlanStepInstruction is still on
the stack. If you then `continue` the thread, lldb will now stop and
say, "instruction step completed", one instruction past the
BreakpointSite. You can continue a second time to resume execution.
The bugs driving this change are all from lldb dropping the real stop
reason for a thread and setting it to breakpoint-hit when that was not
the case. Jim hit one where we have an aarch64 watchpoint that triggers
one instruction before a BreakpointSite. On this arch we are notified of
the watchpoint hit after the instruction has been unrolled -- we disable
the watchpoint, instruction step, re-enable the watchpoint and collect
the new value. But now we're on a BreakpointSite so the watchpoint-hit
stop reason is lost.
Another was reported by ZequanWu in
https://discourse.llvm.org/t/lldb-unable-to-break-at-start/78282 we
attach to/launch a process with the pc at a BreakpointSite and
misbehave. Caroline Tice mentioned it is also a problem they've had with
putting a breakpoint on _dl_debug_state.
The change to each Process plugin that does execution control is that
1. If we've stopped at a BreakpointSite that has not been executed yet,
we will call Thread::SetThreadStoppedAtUnexecutedBP(pc) to record that.
When the thread resumes, if the pc is still at the same site, we will
continue, hit the breakpoint, and stop again.
2. When we've actually hit a breakpoint (enabled for this thread or
not), the Process plugin should call
Thread::SetThreadHitBreakpointSite(). When we go to resume the thread,
we will push a step-over-breakpoint ThreadPlan before resuming.
The biggest set of changes is to StopInfoMachException where we
translate a Mach Exception into a stop reason. The Mach exception codes
differ in a few places depending on the target (unambiguously), and I
didn't want to duplicate the new code for each target so I've tested
what mach exceptions we get for each action on each target, and
reorganized StopInfoMachException::CreateStopReasonWithMachException to
document these possible values, and handle them without specializing
based on the target arch.
I first landed this patch in July 2024 via
https://github.com/llvm/llvm-project/pull/96260
but the CI bots and wider testing found a number of test case failures
that needed to be updated, I reverted it. I've fixed all of those issues
in separate PRs and this change should run cleanly on all the CI bots
now.
rdar://123942164
Recognize the visionOS Triple::OSType::XROS os type. Some of these have
already been landed on main, but I reviewed the downstream sources and
there were a few that still needed to be landed upstream.
The maximum number of load/store watchpoints and fetch instruction
watchpoints is 14 each according to LoongArch Reference Manual [1],
so extend the maximum number of watchpoints from 8 to 14 for ptrace.
A new struct user_watch_state_v2 was added into uapi in the related
kernel commit 531936dee53e ("LoongArch: Extend the maximum number of
watchpoints") [2], but there may be no struct user_watch_state_v2 in
the system header in time.
In order to avoid undefined or redefined error, just add a new struct
loongarch_user_watch_state in LLDB which is same with the uapi struct
user_watch_state_v2, then replace the current user_watch_state with
loongarch_user_watch_state.
As far as I can tell, the only users for this struct in the userspace
are GDB and LLDB, there are no any problems of software compatibility
between the application and kernel according to the analysis.
The compatibility problem has been considered while developing and
testing. When the applications in the userspace get watchpoint state,
the length will be specified which is no bigger than the sizeof struct
user_watch_state or user_watch_state_v2, the actual length is assigned
as the minimal value of the application and kernel in the generic code
of ptrace:
```
kernel/ptrace.c: ptrace_regset():
kiov->iov_len = min(kiov->iov_len,
(__kernel_size_t) (regset->n * regset->size));
if (req == PTRACE_GETREGSET)
return copy_regset_to_user(task, view, regset_no, 0,
kiov->iov_len, kiov->iov_base);
else
return copy_regset_from_user(task, view, regset_no, 0,
kiov->iov_len, kiov->iov_base);
```
For example, there are four kind of combinations, all of them work well.
(1) "older kernel + older app", the actual length is 8+(8+8+4+4)*8=200;
(2) "newer kernel + newer app", the actual length is 8+(8+8+4+4)*14=344;
(3) "older kernel + newer app", the actual length is 8+(8+8+4+4)*8=200;
(4) "newer kernel + older app", the actual length is 8+(8+8+4+4)*8=200.
[1]
https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#control-and-status-registers-related-to-watchpoints
[2]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=531936dee53e
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
On some OS distros such as LoongArch Fedora 38 mate-5 [1], there are
no macro definitions NT_LOONGARCH_HW_BREAK and NT_LOONGARCH_HW_WATCH
in the system header, then there exist some errors when building LLDB
on LoongArch.
(1) Description of Problem:
```
llvm-project/lldb/source/Plugins/Process/Linux/NativeRegisterContextLinux_loongarch64.cpp:529:16:
error: 'NT_LOONGARCH_HW_WATCH' was not declared in this scope; did you mean 'NT_LOONGARCH_LBT'?
529 | int regset = NT_LOONGARCH_HW_WATCH;
| ^~~~~~~~~~~~~~~~~~~~~
| NT_LOONGARCH_LBT
llvm-project/lldb/source/Plugins/Process/Linux/NativeRegisterContextLinux_loongarch64.cpp:543:12:
error: 'NT_LOONGARCH_HW_BREAK' was not declared in this scope; did you mean 'NT_LOONGARCH_CSR'?
543 | regset = NT_LOONGARCH_HW_BREAK;
| ^~~~~~~~~~~~~~~~~~~~~
| NT_LOONGARCH_CSR
```
(2) Steps to Reproduce:
```
git clone https://github.com/llvm/llvm-project.git
mkdir -p llvm-project/llvm/build && cd llvm-project/llvm/build
cmake .. -G "Ninja" \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_BUILD_RUNTIME=OFF \
-DLLVM_ENABLE_PROJECTS="clang;lldb" \
-DCMAKE_INSTALL_PREFIX=/usr/local/llvm \
-DLLVM_TARGETS_TO_BUILD="LoongArch" \
-DLLVM_HOST_TRIPLE=loongarch64-redhat-linux
ninja
```
(3) Additional Info:
Maybe there are no problems on the OS distros with newer glibc devel
library, so this issue is related with OS distros.
(4) Root Cause Analysis:
This is because the related Linux kernel commit [2] was merged in
2023-02-25 and the glibc devel library has some delay with kernel,
the glibc version of specified OS distros is not updated in time.
(5) Final Solution:
One way is to ask the maintainer of OS distros to update glibc devel
library, but it is better to not depend on the glibc version.
In order to avoid the build errors, just define NT_LOONGARCH_HW_BREAK
and NT_LOONGARCH_HW_WATCH in LLDB if there are no these definitions in
the system header.
By the way, in order to fit within 80 columns, use C++-style comments
for the new added NT_LOONGARCH_HW_BREAK and NT_LOONGARCH_HW_WATCH.
While at it, for consistency, just modify the current NT_LOONGARCH_LSX
and NT_LOONGARCH_LASX to C++-style comments too.
[1]
https://mirrors.wsyu.edu.cn/fedora/linux/development/rawhide/Everything/loongarch64/iso/livecd-fedora-mate-5.loongarch64.iso
[2]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1a69f7a161a7
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
These prevented ThreadMemory from correctly returning the
Name/Queue/Info of the backing thread.
Note about testing: this test only finds regressions if the system sets
a name or queue for the backing thread. While this may not be true
everywhere, it still provides coverage in some systems, e.g. in Apple
platforms.
Many uses of SC::GetAddressRange were not interested in the range, but
in the address of the function/symbol contained inside the symbol
context. They were getting that by calling the GetBaseAddress on the
returned range, which worked well enough so far, but isn't compatible
with discontinuous functions, whose address (entry point) may not be the
lowest address in the range.
To resolve this problem, this PR creates a new function whose purpose is
return the address of the function or symbol inside the symbol context.
It also changes all of the callers of GetAddressRange which do not
actually care about the range to call this function instead.
Generally speaking, process plugins (e.g. ProcessGDBRemote) should not
be aware of OS plugin threads. However, ProcessGDBRemote attempts to
check for the existence of OS threads when calculating stop info. When
OS threads are present, it sets the stop info directly on the OS plugin
thread and leaves the ThreadGDBRemote without a StopInfo.
This is problematic for a few reasons:
1. No other process plugins do this, as they shouldn't. They should set
the stop info for their own process threads, and let the abstractions
built on top propagate StopInfos.
2. This conflicts with the expectations of ThreadMemory, which checks
for the backing threads's info, and then attempts to propagate it (in
the future, it should probably ask the plugin itself too...). We see
this happening in the code below. The `if` condition will not trigger,
because `backing_stop_info_sp` will be null (remember, ProcessGDB remote
is ignoring its own threads), and then this method returns false.
```
bool ThreadMemory::CalculateStopInfo() {
...
lldb::StopInfoSP backing_stop_info_sp(
m_backing_thread_sp->GetPrivateStopInfo());
if (backing_stop_info_sp &&
backing_stop_info_sp->IsValidForOperatingSystemThread(*this)) {
backing_stop_info_sp->SetThread(shared_from_this());
```
```
Thread::GetPrivateStopInfo
...
if (!CalculateStopInfo())
SetStopInfo(StopInfoSP());
```
To solve this, we change ProcessGDB remote so that it does the
principled thing: it now only sets the stop info of its own threads.
This change by itself breaks the tests TestPythonOSPlugin.py and
TestOSPluginStepping.py and probably explains why ProcessGDB had
originally "violated" this isolation of layers.
To make this work, BreakpointSites must be aware of BackingThreads when
answering the question: "Is this breakpoint valid for this thread?".
Why? Breakpoints are created on top of the OS threads (that's what the
user sees), but breakpoints are hit by process threads. In the presence
of OS threads, a TID-specific breakpoint is valid for a process thread
if it is backing an OS thread with that TID.
This patch fixes LLDB Windows build with MSVC compiler. MSVC deletes
the default constructor due to virtual inheritance rules. Explicitly
define the default constructor in NativeRegisterContextWindows to
ensure constructibility.
This reverts commit a774de807e.
This is the same changes as last time, plus:
* We load the binary into the target object so that on Windows, we can
resolve the locations of the functions.
* We now assert that each required breakpoint has at least 1 location,
to prevent an issue like that in the future.
* We are less strict about the unsupported error message, because it
prints "error: windows" on Windows instead of "error: gdb-remote".
This PR adds support for hardware watchpoints in LLDB for AArch64
Windows targets.
Windows does not provide an API to query the number of available
hardware watchpoints supported by underlying hardware platform.
Therefore, current implementation supports only a single hardware
watchpoint, which has been verified on Windows 11 using Microsoft
SQ2 and Snapdragon Elite X hardware.
LLDB test suite ninja check-lldb still fails watchpoint-related tests.
However, tests that do not require more than a single watchpoint
pass successfully when run individually.
Reverts llvm/llvm-project#123945
Has failed on the Windows on Arm buildbot:
https://lab.llvm.org/buildbot/#/builders/141/builds/5865
```
********************
Unresolved Tests (2):
lldb-api :: functionalities/reverse-execution/TestReverseContinueBreakpoints.py
lldb-api :: functionalities/reverse-execution/TestReverseContinueWatchpoints.py
********************
Failed Tests (1):
lldb-api :: functionalities/reverse-execution/TestReverseContinueNotSupported.py
```
Reverting while I reproduce locally.
This reverts commit 22561cfb44 and fixes
b7b9ccf449 (#112079).
The problem is that x86_64 and Arm 32-bit have memory regions above the
stack that are readable but not writeable. First Arm:
```
(lldb) memory region --all
<...>
[0x00000000fffcf000-0x00000000ffff0000) rw- [stack]
[0x00000000ffff0000-0x00000000ffff1000) r-x [vectors]
[0x00000000ffff1000-0xffffffffffffffff) ---
```
Then x86_64:
```
$ cat /proc/self/maps
<...>
7ffdcd148000-7ffdcd16a000 rw-p 00000000 00:00 0 [stack]
7ffdcd193000-7ffdcd196000 r--p 00000000 00:00 0 [vvar]
7ffdcd196000-7ffdcd197000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0 [vsyscall]
```
Compare this to AArch64 where the test did pass:
```
$ cat /proc/self/maps
<...>
ffffb87dc000-ffffb87dd000 r--p 00000000 00:00 0 [vvar]
ffffb87dd000-ffffb87de000 r-xp 00000000 00:00 0 [vdso]
ffffb87de000-ffffb87e0000 r--p 0002a000 00:3c 76927217 /usr/lib/aarch64-linux-gnu/ld-linux-aarch64.so.1
ffffb87e0000-ffffb87e2000 rw-p 0002c000 00:3c 76927217 /usr/lib/aarch64-linux-gnu/ld-linux-aarch64.so.1
fffff4216000-fffff4237000 rw-p 00000000 00:00 0 [stack]
```
To solve this, look up the memory region of the stack pointer (using
https://lldb.llvm.org/resources/lldbgdbremote.html#qmemoryregioninfo-addr)
and constrain the read to within that region. Since we know the stack is
all readable and writeable.
I have also added skipIfRemote to the tests, since getting them working
in that context is too complex to be worth it.
Memory write failures now display the range they tried to write, and
register write errors will show the name of the register where possible.
The patch also includes a workaround for a an issue where the test code
could mistake an `x` response that happens to begin with an `O` for an
output packet (stdout). This workaround will not be necessary one we
start using the [new
implementation](https://discourse.llvm.org/t/rfc-fixing-incompatibilties-of-the-x-packet-w-r-t-gdb/84288)
of the `x` packet.
---------
Co-authored-by: Pavel Labath <pavel@labath.sk>
Fixes c5840cc609.
On platforms where UL is 32 bit, like Windows or 32 bit Linux,
this shift was not correct, so we assumed GCS was not present.
Use ULL instead, to match the other HWCAP constants.
The features and locked registers hold the same bits, the latter
is a lock for the former. Tested with core files and live processes.
I thought about setting a non-zero lock register in the core file,
however:
* We can be pretty sure it's reading correctly because its between
the 2 other GCS registers in the same core file note.
* I can't make the test case modify lock bits because userspace
can't clear them (without using ptrace) and we don't know what the libc
has locked
(probably all feature bits).
This allows you to read the same registers as you would for a live
process.
As the content of proc/pid/smaps is not included in the core file, we
don't get the "ss" marker that tell us that it is shadow stack. The GCS
region is still in the list though.
When the Guarded Control Stack (GCS) is enabled, returns cause the
processor to validate that the address at the location pointed to by
gcspr_el0 matches the one in the link register.
```
ret (lr=A) << pc
| GCS |
+=====+
| A |
| B | << gcspr_el0
Fault: tried to return to A when you should have returned to B.
```
Therefore when an expression wrapper function tries to return to the
expression return address (usually `_start` if there is a libc), it
would fault.
```
ret (lr=_start) << pc
| GCS |
+============+
| user_func1 |
| user_func2 | << gcspr_el0
Fault: tried to return to _start when you should have returned to user_func2.
```
To fix this we must push that return address to the GCS in
PrepareTrivialCall. This value is then consumed by the final return and
the expression completes as expected.
If for some reason that fails, we will manually restore the value of
gcspr_el0, because it turns out that PrepareTrivialCall
does not restore registers if it fails at all. So for now I am handling
gcspr_el0 specifically, but I have filed
https://github.com/llvm/llvm-project/issues/124269 to address the
general problem.
(the other things PrepareTrivialCall does are exceedingly likely to not
fail, so we have never noticed this)
```
ret (lr=_start) << pc
| GCS |
+============+
| user_func1 |
| user_func2 |
| _start | << gcspr_el0
No fault, we return to _start as normal.
```
The gcspr_el0 register will be restored after expression evaluation so
that the program can continue correctly.
However, due to restrictions in the Linux GCS ABI, we will not restore
the enable bit of gcs_features_enabled. Re-enabling GCS via ptrace is
not supported because it requires memory to be allocated by the kernel.
We could disable GCS if the expression enabled GCS, however this would
use up that state transition that the program might later rely on. And
generally it is cleaner to ignore the enable bit, rather than one state
transition of it.
We will also not restore the GCS entry that was overwritten with the
expression's return address. On the grounds that:
* This entry will never be used by the program. If the program branches,
the entry will be overwritten. If the program returns, gcspr_el0 will
point to the entry before the expression return address and that entry
will instead be validated.
* Any expression that calls functions will overwrite even more entries,
so the user needs to be aware of that anyway if they want to preserve
the contents of the GCS for inspection.
* An expression could leave the program in a state where restoring the
value makes the situation worse. Especially if we ever support this in
bare metal debugging.
I will later document all this on
https://lldb.llvm.org/use/aarch64-linux.html.
Tests have been added for:
* A function call that does not interact with GCS.
* A call that does, and disables it (we do not re-enable it).
* A call that does, and enables it (we do not disable it again).
* Failure to push an entry to the GCS stack.
The Guarded Control Stack extension implements a shadow stack and the
Linux kernel provides access to 3 registers for it via ptrace.
struct user_gcs {
__u64 features_enabled;
__u64 features_locked;
__u64 gcspr_el0;
};
This commit adds support for reading those from a live process.
The first 2 are pseudo registers based on the real control register and
the 3rd is a real register. This is the stack pointer for the guarded
stack.
I have added a "gcs_" prefix to the "features" registers so that they
have a clear name when shown individually. Also this means they will tab
complete from "gcs", and be next to gcspr_el0 in any sorted lists of
registers.
Guarded Control Stack Registers:
gcs_features_enabled = 0x0000000000000000
gcs_features_locked = 0x0000000000000000
gcspr_el0 = 0x0000000000000000
Testing is more of the usual, where possible I'm writing a register then
doing something in the program to confirm the value was actually sent to
ptrace.
.. by changing the signal stop reason format 🤦
The reason this did not work is because the code in
`StopInfo::GetCrashingDereference` was looking for the string "address="
to extract the address of the crash. Macos stop reason strings have the
form
```
EXC_BAD_ACCESS (code=1, address=0xdead)
```
while on linux they look like:
```
signal SIGSEGV: address not mapped to object (fault address: 0xdead)
```
Extracting the address from a string sounds like a bad idea, but I
suppose there's some value in using a consistent format across
platforms, so this patch changes the signal format to use the equals
sign as well. All of the diagnose tests pass except one, which appears
to fail due to something similar #115453 (disassembler reports
unrelocated call targets).
I've left the tests disabled on windows, as the stop reason reporting
code works very differently there, and I suspect it won't work out of
the box. If I'm wrong -- the XFAIL will let us know.
This commit adds support for a
`SBProcess::ContinueInDirection()` API. A user-accessible command for
this will follow in a later commit.
This feature depends on a gdbserver implementation (e.g. `rr`) providing
support for the `bc` and `bs` packets. `lldb-server` does not support
those packets, and there is no plan to change that. For testing
purposes, this commit adds a Python implementation of *very limited*
record-and-reverse-execute functionality, implemented as a proxy between
lldb and lldb-server in `lldbreverse.py`. This should not (and in
practice cannot) be used for anything except testing.
The tests here are quite minimal but we test that simple breakpoints and
watchpoints work as expected during reverse execution, and that
conditional breakpoints and watchpoints work when the condition calls a
function that must be executed in the forward direction.
This will be sent by Arm's Guarded Control Stack extension when an
invalid return is executed.
The signal does have an address we could show, but it's the PC at which
the fault occured. The debugger has plenty of ways to show you that
already, so I've left it out.
```
(lldb) c
Process 460 resuming
Process 460 stopped
* thread #1, name = 'test', stop reason = signal SIGSEGV: control protection fault
frame #0: 0x0000000000400784 test`main at main.c:57:1
54 afunc();
55 printf("return from main\n");
56 return 0;
-> 57 }
(lldb) dis
<...>
-> 0x400784 <+100>: ret
```
The new test case generates the signal by corrupting the link register
then attempting to return. This will work whether we manually enable GCS
or the C library does it for us.
(in the former case you could just return from main and it would fault)
Lots of code around LLDB was directly accessing the target's section
load list. This NFC patch makes the section load list private so the
Target class can access it, but everyone else now uses accessor
functions. This allows us to control the resolving of addresses and will
allow for functionality in LLDB which can lazily resolve addresses in
JIT plug-ins with a future patch.
In the presence of OS plugins, StopInfoMachException currently
propagates breakpoint stop reasons even if those breakpoints were not
intended for a specific thread, effectively removing our ability to set
thread-specific breakpoints.
This was originally added in [1], but the motivation provided in the
comment does not seem strong enough to remove the ability to set
thread-specific breakpoints. The only way to break thread specific
breakpoints would be if a user set such a breakpoint and _then_ loaded
an OS plugin, a scenario which we would likely not want to support.
[1]:
ab745c2ad8 (diff-8ec6e41b1dffa7ac4b5841aae24d66442ef7ebc62c8618f89354d84594f91050R501)
This is intended for use with Arm's Guarded Control Stack extension
(GCS). Which reuses some existing shadow stack support in Linux. It
should also work with the x86 equivalent.
A "ss" flag is added to the "VmFlags" line of shadow stack memory
regions in `/proc/<pid>/smaps`. To keep the naming generic I've called
it shadow stack instead of guarded control stack.
Also the wording is "shadow stack: yes" because the shadow stack region
is just where it's stored. It's enabled for the whole process or it
isn't. As opposed to memory tagging which can be enabled per region, so
"memory tagging: enabled" fits better for that.
I've added a test case that is also intended to be the start of a set of
tests for GCS. This should help me avoid duplicating the inline assembly
needed.
Note that no special compiler support is needed for the test. However,
for the intial enabling of GCS (assuming the libc isn't doing it) we do
need to use an inline assembly version of prctl.
This is because as soon as you enable GCS, all returns are checked
against the GCS. If the GCS is empty, the program will fault. In other
words, you can never return from the function that enabled GCS, unless
you push values onto it (which is possible but not needed here).
So you cannot use the libc's prctl wrapper for this reason. You can use
that wrapper for anything else, as we do to check if GCS is enabled.
With this patch, vector registers can be read and written when debugging a live process.
Note: We currently assume that all LoongArch64 processors include the
LSX and LASX extensions.
To add test cases, the following modifications were also made:
lldb/packages/Python/lldbsuite/test/lldbtest.py
lldb/packages/Python/lldbsuite/test/make/Makefile.rules
Reviewed By: DavidSpickett, SixWeining
Pull Request: https://github.com/llvm/llvm-project/pull/120664
This commit addresses a bug introduced in commit bcf654c, which
prevented LLDB from parsing the GNU build ID for the main executable
from a core file. The fix finds the `p_vaddr` of the first `PT_LOAD`
segment as the `base_addr` and subtract this `base_addr` from the
virtual address being read.
Co-authored-by: George Hu <hyubo@meta.com>
In #119598 my recent TLS feature seems to break crashpad symbols. I have
a few ideas on how this is happening, but for now as a mitigation I'm
checking if the Minidump was LLDB generated, and if so leveraging the
dynamic loader.
Since the setup of debug registers for AArch64 and LoongArch is similar,
we extracted the shared logic from Class:
`NativeRegisterContextDBReg_arm64`
into a new Class:
`NativeRegisterContextDBReg`.
This will simplify the subsequent implementation of hardware breakpoints
and watchpoints on LoongArch.
Reviewed By: DavidSpickett
Pull Request: https://github.com/llvm/llvm-project/pull/118043
It's never set to true. Also, using inheritable FDs in a multithreaded
process pretty much guarantees descriptor leaks. It's better to
explicitly pass a specific FD to a specific subprocess, which we already
mostly can do using the ProcessLaunchInfo FileActions.
Allows us to stop waiting for a connection if it doesn't come in a
certain amount of time. Right now, I'm keeping the status quo (infitnite
wait) in the "production" code, but using smaller (finite) values in
tests. (A lot of these tests create "loopback" connections, where a
really short wait is sufficient: on linux at least even a poll (0s wait)
is sufficient if the other end has connect()ed already, but this doesn't
seem to be the case on Windows, so I'm using a 1s wait in these cases).
On #110065 the changes to LinuxSigInfo Struct introduced some variables
that will differ in size on 32b or 64b. I've rectified this by setting
them all to build independent types.
ELF core debugging fix#117070 broke TestLoadUnload.py tests due to
GetModuleSpec call, ProcessGDBRemote fetches modules from remote. Revise
the original PR, renamed FindBuildId to FindModuleUUID.