This reverts commit d6713ad80d.
This changed was reverted because of greendragon failures such
as
Unresolved Tests (2):
lldb-api :: debuginfod/Normal/TestDebuginfod.py
lldb-api :: debuginfod/SplitDWARF/TestDebuginfodDWP.py
I believe I've got the tests properly configured to only run on Linux
x86(_64), as I don't have a Linux AArch64/Arm device to diagnose what's
going wrong with the tests (I suspect there's some issue with generating
`.note.gnu.build-id` sections...)
The actual code fixes have now been reviewed 3 times:
https://github.com/llvm/llvm-project/pull/79181 (moved shell tests to
API tests), https://github.com/llvm/llvm-project/pull/85693 (Changed
some of the testing infra), and
https://github.com/llvm/llvm-project/pull/86812 (didn't get the tests
configured quite right). The Debuginfod integration for symbol
acquisition in LLDB now works with the `executable` and `debuginfo`
Debuginfod network requests working properly for normal, `objcopy
--only-keep-debug` stripped, split-dwarf, and `objcopy
--only-keep-debug` stripped *plus* split-dwarf symbols/binaries.
The reasons for the multiple attempts have been tests on platforms I
don't have access to (Linux AArch64/Arm + MacOS x86_64). I believe I've
got the tests properly disabled for everything except for Linux x86(_64)
now. I've built & tested on MacOS AArch64 and Linux x86_64.
---------
Co-authored-by: Kevin Frei <freik@meta.com>
This is a followup to
https://github.com/llvm/llvm-project/pull/86359
"[lldb] [ObjectFileMachO] LLVM_COV is not mapped into firmware memory
(#86359)"
where I treat LLVM_COV segments in a Mach-O binary as non-loadable.
There is another codepath in
`DynamicLoaderStatic::LoadAllImagesAtFileAddresses` which is called to
set the load addresses for a Module to the file addresses. It has no
logic to detect a segment that is not loaded in virtual memory
(ObjectFileMachO::SectionIsLoadable), so it would set the load address
for this LLVM_COV segment to the file address and shadow actual code,
breaking lldb behavior.
This method currently sets the load address for any section that doesn't
have a load address set already. This presumes that a Module was added
to the Target, some mechanism set the correct load address for SOME
segments, and then this method is going to set the other segments to a
no-slide value, assuming they were forgotten.
ObjectFile base class doesn't, today, vend a SectionIsLoadable method,
but we do have ObjectFile::SetLoadAddress and at a higher level,
Module::SetLoadAddress, when we're setting the same slide to all
segments.
That's the behavior we want in this method. If any section has a load
address, we don't touch this Module. Otherwise we set all sections to
have a load address that is the same as the file address.
I also audited the other parts of lldb that are calling
SectionList::SectionLoadAddress and looked if they should be more
correctly using Module::SetLoadAddress for the entire binary. But in
most cases, we have the potential for different slides for different
sections so this section-by-section approach must be taken.
rdar://125800290
The previous diff (and it's subsequent fix) were reverted as the tests
didn't work properly on the AArch64 & ARM LLDB buildbots. I made a
couple more minor changes to tests (from @clayborg's feedback) and
disabled them for non Linux-x86(_64) builds, as I don't have the ability
do anything about an ARM64 Linux failure. If I had to guess, I'd say the
toolchain on the buildbots isn't respecting the `-Wl,--build-id` flag.
Maybe, one day, when I have a Linux AArch64 system I'll dig in to it.
From the reverted PR:
I've migrated the tests in my
https://github.com/llvm/llvm-project/pull/79181 from shell to API (at
@JDevlieghere's suggestion) and addressed a couple issues that were
exposed during testing.
The tests first test the "normal" situation (no DebugInfoD involvement,
just normal debug files sitting around), then the "no debug info"
situation (to make sure the test is seeing failure properly), then it
tests to validate that when DebugInfoD returns the symbols, things work
properly. This is duplicated for DWP/split-dwarf scenarios.
---------
Co-authored-by: Kevin Frei <freik@meta.com>
This patch introduces a new `IterationMarker` enum (happy to take
alternative name suggestions), which callbacks, like the one in
`SymbolFileDWARFDebugMap::ForEachSymbolFile`, can return in order to
indicate whether the caller should continue iterating or bail.
For now this patch just changes the `ForEachSymbolFile` callback to use
this new enum. In the future we could change the various
`DWARFIndex::GetXXX` callbacks to do the same.
This makes the callbacks easier to read and hopefully reduces the chance
of bugs like https://github.com/llvm/llvm-project/pull/87177.
While adding register fields I realised that the AUXV values for Linux
and FreeBSD disagree here.
So I've added a FreeBSD specific HWCAP value that I can use from FreeBSD
specific code.
The alternative is translating GetAuxValue calls depending on platform,
which requires that we know what we are at all times.
Another way would be to convert the entries' values when we construct
the AuxVector but the platform specific call that reads the data just
returns a raw array. So adding another layer here is more disruption.
These are probably actually unreachable - perhaps an lldb developer
would be interested in rephrasing this change to move the new cases into
some unreachable/unsupported bucket, rather than my half-hearted guess
at what the desired behavior would be (completely untested, because
they're probably untestable/unreachable - maybe debugging from modules?)
We have the ability to load .dwp files with a .debug_info.dwo section
that exceeds 4GB. There were 4 locations that were using 32 bit offsets
and lengths to extract variable locations, and if a DIE was over the 4GB
barrier, we would truncate the block offset for the variable locations
and the variable expression would be garbage. This fixes the issues. It
isn't possible to add a test for this as we don't want to create a 4GB
.dwp file on test machines.
An inverted condition causes `SymbolFileDWARFDebugMap::FindTypes` to
bail out after inspecting the first .o file in each module.
The same kind of bug is found in
`SymbolFileDWARFDebugMap::ParseDeclsForContext`.
Correct both early exit conditions and add a regression test for lookup
of up a type defined in a secondary compilation unit.
Fixes#87176
This is fixing a report from ubsan which I don't think is super high
value, but our testsuite hits it on
TestDataFormatterObjCNSContainer.py so I'd like to work around it. We
are getting
```
runtime error: left shift of negative value -8827055269646171913
3159 int64_t data_payload_signed =
3160 ((int64_t)((int64_t)unobfuscated
-> 3161 << m_objc_debug_taggedpointer_ext_payload_lshift) >>
3162 m_objc_debug_taggedpointer_ext_payload_rshift);
```
At this point `unobfuscated` is 0x85800000000000f7 and
`m_objc_debug_taggedpointer_ext_payload_lshift` is 9, so
`(int64_t)0x85800000000000f7<<9` shifts off the "sign" bit and then some
zeroes etc, and that's how we get this error.
We're only trying to extract some bits in the middle of the doubleword,
so the fact that we're "losing" the sign is not a bug. Change the inner
cast to (uint64_t).
It is possible to gather code coverage in a firmware environment, where
the __LLVM_COV segment will not be mapped in memory but does exist in
the binary, see
https://llvm.org/devmtg/2020-09/slides/PhippsAlan_EmbeddedCodeCoverage_LLVM_Conf_Talk_final.pdf
The __LLVM_COV segment in the binary happens to be at the same address
as the __DATA segment, so if lldb treats this segment as loaded, it
shadows the __DATA segment and address->symbol resolution can fail.
For these non-userland code cases, we need to mark __LLVM_COV as not a
loadable segment.
rdar://124475661
commit e66b670f3b
Author: Nathan Lanza <nathanlanza@gmail.com>
Date: Thu Mar 21 19:53:48 2024 -0400
triggers:
lldb/source/Plugins/TypeSystem/Clang/TypeSystemClang.cpp:478:16:
error: enumeration value 'CIR' not handled in switch
[-Werror,-Wswitch]
This patch teaches lldb to handle clang::Language::CIR the same way as
clang::Language::LLVM_IR.
Finally getting back to Debuginfod tests:
I've migrated the tests in my [earlier
PR](https://github.com/llvm/llvm-project/pull/79181) from shell to API
(at @JDevlieghere's suggestion) and addressed a couple issues that came
about during testing.
The tests first test the "normal" situation (no DebugInfoD involvement,
just normal debug files sitting around), then the "no debug info"
situation (to make sure the test is seeing failure properly), then it
tests to validate that when Debuginfod returns the symbols, things work
properly. This is duplicated for DWP/split-dwarf scenarios.
---------
Co-authored-by: Kevin Frei <freik@meta.com>
This is another step towards supporting DWARF5 checksums and inline
source code in LLDB. This is a reland of #85468 but without the
functional change of storing the support file from the line table (yet).
AddressableBits is in the Utility module of LLDB. It currently directly
refers to Process, which is from the Target LLDB module. This is a
layering violation which concretely means that it is impossible to link
anything that uses Utility without it also using Target as well. This is
generally not an issue for LLDB (since everything is built together) but
it may make it difficult to write unit tests for AddressableBits later
on.
This caused following warnings in an LLDB build:
```
[237/1072] Building CXX object tools/l...lusLanguage.dir/LibCxxSliceArray.cpp.o
/Volumes/Data/llvm-project/lldb/source/Plugins/Language/CPlusPlus/LibCxxSliceArray.cpp:38:53: warning: format specifies type 'unsigned long long' but the argument has type 'size_t' (aka 'unsigned long') [-Wformat]
38 | stream.Printf("stride=%" PRIu64 " size=%" PRIu64, stride, size);
| ~~~~~~~~~ ^~~~~~
/Volumes/Data/llvm-project/lldb/source/Plugins/Language/CPlusPlus/LibCxxSliceArray.cpp:38:61: warning: format specifies type 'unsigned long long' but the argument has type 'size_t' (aka 'unsigned long') [-Wformat]
38 | stream.Printf("stride=%" PRIu64 " size=%" PRIu64, stride, size);
| ~~~~~~~~~ ^~~~
2 warnings generated.
```
This patch simply changes the format specifiers to use the `%zu` for
`size_t`s.
Currently, we always show the argument passed to dsymForUUID in the
corresponding progress update. Most of the time this is a UUID, but it
can also be an absolute path. The former is pretty uninformative and the
latter needlessly noisy.
This changes the progress update to print the UUID and the module name,
if both are available. Otherwise, we print the UUID or the module name
depending on which one is available.
We now also unconditionally pass the module file spec and architecture
to DownloadObjectAndSymbolFile, while previously this was conditional on
the file existing on-disk. This should be harmless:
- We already check that the file exists in DownloadObjectAndSymbolFile.
- It doesn't make sense to check the filesystem for the architecutre.
rdar://124643548
Darwin AArch64 application processors are run with Top Byte Ignore mode
enabled so metadata may be stored in the top byte, it needs to be
ignored when reading/writing memory. David Spickett handled this already
in the base class Process::ReadMemory but ProcessMachCore overrides that
method (to avoid the memory cache) and did not pick up the same change.
I add a test case that creates a pointer with metadata in the top byte
and dereferences it with a live process and with a corefile.
rdar://123784501
symbol
This patch puts the default breakpoint on the
sanitizers_address_on_report symbol, and uses the old symbol as a backup
if the default case is not found
rdar://123911522
Walter Erquinigo added optional instruction annotations for x86
instructions in 2022 for the `thread trace dump instruction` command,
and code to DisassemblerLLVMC to add annotations for instructions that
change flow control, v. https://reviews.llvm.org/D128477
This was added as an option to `disassemble`, and the trace dump command
enables it by default, but several other instruction dumpers were
changed to display them by default as well. These are only implemented
for Intel instructions, so our disassembly on other targets ends up
looking like
```
(lldb) x/5i 0x1000086e4
0x1000086e4: 0xa9be6ffc unknown stp x28, x27, [sp, #-0x20]!
0x1000086e8: 0xa9017bfd unknown stp x29, x30, [sp, #0x10]
0x1000086ec: 0x910043fd unknown add x29, sp, #0x10
0x1000086f0: 0xd11843ff unknown sub sp, sp, #0x610
0x1000086f4: 0x910c63e8 unknown add x8, sp, #0x318
```
instead of `disassemble`'s output style of
```
lldb`main:
lldb[0x1000086e4] <+0>: stp x28, x27, [sp, #-0x20]!
lldb[0x1000086e8] <+4>: stp x29, x30, [sp, #0x10]
lldb[0x1000086ec] <+8>: add x29, sp, #0x10
lldb[0x1000086f0] <+12>: sub sp, sp, #0x610
lldb[0x1000086f4] <+16>: add x8, sp, #0x318
```
Adding symbolic annotations for assembly instructions is something I'm
interested in too, because we may have users investigating a crash or
apparent-incorrect behavior who must debug optimized assembly and they
may not be familiar with the ISA they're using, so short of flipping
through a many-thousand-page PDF to understand each instruction, they're
lost. They don't write assembly or work at that level, but to understand
a bug, they have to understand what the instructions are actually doing.
But the annotations that exist today don't move us forward much on that
front - I'd argue that the flow control instructions on Intel are not
hard to understand from their names, but that might just be my personal
bias. Much trickier instructions exist in any event.
Displaying this information by default for all targets when we only have
one class of instructions on one target is not a good default.
Also, in 2011 when Greg implemented the `memory read -f i` (aka `x/i`)
command
```
commit 5009f9d501
Author: Greg Clayton <gclayton@apple.com>
Date: Thu Oct 27 17:55:14 2011 +0000
[...]
eFormatInstruction will print out disassembly with bytes and it will use the
current target's architecture. The format character for this is "i" (which
used to be being used for the integer format, but the integer format also has
"d", so we gave the "i" format to disassembly), the long format is
"instruction".
```
he had DumpDataExtractor's DumpInstructions print the bytes of the
instruction -- that's the first field we see above for the `x/5i` after
the address -- and this is only useful for people who are debugging the
disassembler itself, I would argue. I don't want this displayed by
default either.
tl;dr this patch removes both fields from `memory read -f -i` and I
think this is the right call today. While I'm really interested in
instruction annotation, I don't think `x/i` is the right place to have
it enabled by default unless it's really compelling on at least some of
our major targets.
Change GetNumChildren()/CalculateNumChildren() methods return
llvm::Expected
This is an NFC change that does not yet add any error handling or change
any code to return any errors.
This is the second big change in the patch series started with
https://github.com/llvm/llvm-project/pull/83501
A follow-up PR will wire up error handling.
Change GetNumChildren()/CalculateNumChildren() methods return
llvm::Expected
This is an NFC change that does not yet add any error handling or change
any code to return any errors.
This is the second big change in the patch series started with
https://github.com/llvm/llvm-project/pull/83501
A follow-up PR will wire up error handling.
535da10842 introduced a check, when execve fails,
to see if we are allowed to trace programs at all.
Unfortunately because we like to call Status vars "error" and "status"
in various combinations, one got misnamed. This lead to lldb-server trying
to make an error value out of a success value when you did the following:
```
$ ./bin/lldb-server gdbserver 127.0.0.1:1234 -- is_not_a_file
Assertion failed: (Err && "Cannot create Expected<T> from Error success value."), function Expected...
```
This happened because the execve fails, but the check whether we can
trace says yes we can trace, but then we use the Status from the check
to create the return value. That Status is in fact a success value not
the failed Status we got from the execve attempt.
With the name corrected you now get:
```
$ ./bin/lldb-server gdbserver 127.0.0.1:1234 -- is_not_a_file
error: failed to launch 'is_not_a_file': execve failed: No such file or directory
```
Which is what we expect to see. This also fixes the test `TestGDBRemoteLaunch.py`
when asserts are enabled.
DWP files don't usually have a GNU build ID built into them. When
searching for a .dwp file, don't require a UUID to be in the .dwp file.
The debug info search information was checking for a UUID in the .dwp
file when debug info search paths were being used. This is now fixed by
not specifying the UUID in the ModuleSpec being used for the .dwp file
search.
We got user reporting lldb crash while the debuggee is calling vfork()
concurrently from multiple threads.
The crash happens because the current implementation can only handle
single vfork, vforkdone protocol transaction.
This diff fixes the crash by lldb-server storing forked debuggee's <pid,
tid> pair in jstopinfo which will be decoded by lldb client to create
StopInfoVFork for follow parent/child policy. Each StopInfoVFork will
later have a corresponding vforkdone packet. So the patch also changes
the `m_vfork_in_progress` to be reference counting based.
Two new test cases are added which crash/assert without the changes in
this patch.
---------
Co-authored-by: jeffreytan81 <jeffreytan@fb.com>
[lldb] Add SBProcess methods for get/set/use address masks (#83095)
I'm reviving a patch from phabracator, https://reviews.llvm.org/D155905
which was approved but I wasn't thrilled with all the API I was adding
to SBProcess for all of the address mask types / memory regions. In this
update, I added enums to control type address mask type (code, data,
any) and address space specifiers (low, high, all) with defaulted
arguments for the most common case. I originally landed this via
https://github.com/llvm/llvm-project/pull/83095 but it failed on CIs
outside of arm64 Darwin so I had to debug it on more environments
and update the patch.
This patch is also fixing a bug in the "addressable bits to address
mask" calculation I added in AddressableBits::SetProcessMasks. If lldb
were told that 64 bits are valid for addressing, this method would
overflow the calculation and set an invalid mask. Added tests to check
this specific bug while I was adding these APIs.
This patch changes the value of "no mask set" from 0 to
LLDB_INVALID_ADDRESS_MASK, which is UINT64_MAX. A mask of all 1's
means "no bits are used for addressing" which is an impossible mask,
whereas a mask of 0 means "all bits are used for addressing" which
is possible.
I added a base class implementation of ABI::FixCodeAddress and
ABI::FixDataAddress that will apply the Process mask values if they
are set to a value other than LLDB_INVALID_ADDRESS_MASK.
I updated all the callers/users of the Mask methods which were
handling a value of 0 to mean invalid mask to use
LLDB_INVALID_ADDRESS_MASK.
I added code to the all AArch64 ABI Fix* methods to apply the
Highmem masks if they have been set. These will not be set on a
Linux environment, but in TestAddressMasks.py I test the highmem
masks feature for any AArch64 target, so all AArch64 ABI plugins
must handle it.
rdar://123530562
Currently, for x86 and x86_64 triples, "+sse" and "+sse2" are appended
to `Features` vector of `TargetOptions` unconditionally. This vector is
later reset in `TargetInfo::CreateTargetInfo` and filled using info from
`FeaturesAsWritten` vector, so previous modifications of the `Features`
vector have no effect. For x86_64 triple, we append "sse2"
unconditionally in `X86TargetInfo::initFeatureMap`, so despite the
`Features` vector reset, we still have the desired sse features enabled.
The corresponding code in `X86TargetInfo::initFeatureMap` is marked as
FIXME, so we should not probably rely on it and should set desired
features properly in `ClangExpressionParser`.
This patch changes the vector the features are appended to from
`Features` to `FeaturesAsWritten`. It's not reset later and is used to
compute resulting `Features` vector.
Layout information for a record gets stored in the `ClangASTImporter`
associated with the `DWARFASTParserClang` that originally parsed the
record. LLDB sometimes moves clang types from one AST to another (in the
reproducer the origin AST was a precompiled-header and the destination
was the AST backing the executable). When clang then asks LLDB to
`layoutRecordType`, it will do so with the help of the
`ClangASTImporter` the type is associated with. If the type's origin is
actually in a different LLDB module (and thus a different
`DWARFASTParserClang` was used to set its layout info), we won't find
the layout info in our local `ClangASTImporter`.
In the reproducer this meant we would drop the alignment info of the
origin type and misread a variable's contents with `frame var` and
`expr`.
There is logic in `ClangASTSource::layoutRecordType` to import an
origin's layout info. This patch re-uses that infrastructure to import
an origin's layout from one `ClangASTImporter` instance to another.
rdar://123274144
This patch moves the logic for copying the layout info of a
`RecordDecl`s origin into a target AST.
A follow-up patch re-uses the logic from within the `ClangASTImporter`,
so the natural choice was to move it there.
There was a think-o in a previous commit that made us only able to
define 1 line commands when using command script add interactively.
There was also no test for this feature, so I fixed the think-o and
added a test.
Partly, there's just a lot of unnecessary boiler plate. It's also
possible to define combinations of arguments that make no sense (e.g.
eArgRepeatPlus followed by eArgRepeatPlain...) but these are never
checked since we just push_back directly into the argument definitions.
This commit is step 1 of this cleanup - do the obvious stuff. In it, all
the simple homogenous argument lists and the breakpoint/watchpoint
ID/Range types, are set with common functions. This is an NFC change, it
just centralizes boiler plate. There's no checking yet because you can't
get a single argument wrong.
The end goal is that all argument definition goes through functions and
m_arguments is hidden so that you can't define inconsistent argument
sets.