Commit Graph

69 Commits

Author SHA1 Message Date
Jordan Rupprecht
1eeeab82c6 [lldb][test] Modernize assertEqual(value, bool) (#82526)
Any time we see the pattern `assertEqual(value, bool)`, we can replace
that with `assert<bool>(value)`. Likewise for `assertNotEqual`.

Technically this relaxes the test a bit, as we may want to make sure
`value` is either `True` or `False`, and not something that implicitly
converts to a bool. For example, `assertEqual("foo", True)` will fail,
but `assertTrue("foo")` will not. In most cases, this distinction is not
important.

There are two such places that this patch does **not** transform, since
it seems intentional that we want the result to be a bool:
*
5daf2001a1/lldb/test/API/python_api/sbstructureddata/TestStructuredDataAPI.py (L90)
*
5daf2001a1/lldb/test/API/commands/settings/TestSettings.py (L940)

Followup to 9c2468821e. I patched `teyit`
with a `visit_assertEqual` node handler to generate this.
2024-02-21 20:39:02 -06:00
Jordan Rupprecht
9c2468821e [lldb][test] Modernize asserts (#82503)
This uses [teyit](https://pypi.org/project/teyit/) to modernize asserts,
as recommended by the [unittest release
notes](https://docs.python.org/3.12/whatsnew/3.12.html#id3).

For example, `assertTrue(a == b)` is replaced with `assertEqual(a, b)`.
This produces better error messages, e.g. `error: unexpectedly found 1
and 2 to be different` instead of `error: False`.
2024-02-21 13:02:30 -06:00
Nicholas Mosier
a50ea2f76f [lldb] Fix Intel PT plugin compile errors (#77252)
Fix #77251.
2024-01-09 10:58:47 -08:00
Jonas Devlieghere
2238dcc393 [NFC][Py Reformat] Reformat python files in lldb
This is an ongoing series of commits that are reformatting our Python
code. Reformatting is done with `black` (23.1.0).

If you end up having problems merging this commit because you have made
changes to a python file, the best way to handle that is to run `git
checkout --ours <yourfile>` and then reformat it with black.

RFC: https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Differential revision: https://reviews.llvm.org/D151460
2023-05-25 12:54:09 -07:00
Dave Lee
1fb5c7a2f1 [lldb] Rewrite to assertEqual/assertNotEqual (NFC)
Using the more specific assert* methods results in more useful error message.
2022-11-11 17:03:02 -08:00
Jakob Johnson
17c65e51b9 [intelpt] Update Python tests to account for new errrors
Update the Python tests (ie tests run via `lldb-dotest -p TestTrace`) to
handle new error introduced in D136610.

Test Plan:
`lldb-dotest -p TestTrace`

Differential Revision: https://reviews.llvm.org/D136801
2022-10-27 05:09:08 -07:00
Walter Erquinigo
2098e2f472 [trace][intel pt][simple] Fix errors after switching to libipt's top of tree
These tests were being tested against a version of libipt from last
year. We just updated libipt to top of tree and many errors broke
because the new version of libipt emits more events than the older one,
which is fine.

`./bin/lldb-dotest -p TestTrace` passes
2022-10-25 14:34:27 -07:00
Walter Erquinigo
c49d14aca5 [trace][intel pt] Simple detection of infinite decoding loops
The low-level decoder might fall into an infinite decoding loop for
various reasons, the simplest being an infinite direct loop reached due
to wrong handling of self-modified code in the kernel, e.g. it might
reach

```
0x0A: pause
0x0C: jump to 0x0A
```

In this case, all the code is sequential and requires no packets to be
decoded. The low-level decoder would produce an output like the
following

```
0x0A: pause
0x0C: jump to 0x0A
0x0A: pause
0x0C: jump to 0x0A
0x0A: pause
0x0C: jump to 0x0A
... infinite amount of times
```

These cases require stopping the decoder to avoid infinite work and signal this
at least as a trace error.

- Add a check that breaks decoding of a single PSB once 500k instructions have been decoded since the last packet was processed.
- Add a check that looks for infinite loops after certain amount of instructions have been decoded since the last packet was processed.
- Add some `settings` properties for tweaking the thresholds of the checks above. This is also nice because it does the basic work needed for future settings.
- Add an AnomalyDetector class that inspects the DecodedThread and the libipt decoder in search for anomalies. These anomalies are then signaled as fatal errors in the trace.
- Add an ErrorStats class that keeps track of all the errors in a DecodedThread, with a special counter for fatal errors.
- Add an entry for decoded thread errors in the `dump info` command.

Some notes are added in the code and in the documention of the settings,
so please read them.

Besides that, I haven't been unable to create a test case in LLVM style, but
I've found an anomaly in the thread #12 of the trace
72533820-3eb8-4465-b8e4-4e6bf0ccca99 at Meta. We have to figure out how to
artificially create traces with this kind of anomalies in LLVM style.

With this change, that anomalous thread now shows:

```
(lldb)thread trace dump instructions 12 -e -i 23101

thread #12: tid = 8
    ...missing instructions
    23101: (error) anomalous trace: possible infinite loop detected of size 2
  vmlinux-5.12.0-0_fbk8_clang_6656_gc85768aa64da`panic_smp_self_stop + 5 [inlined] rep_nop at processor.h:13:2
    23100: 0xffffffff81342785    pause
  vmlinux-5.12.0-0_fbk8_clang_6656_gc85768aa64da`panic_smp_self_stop + 7 at panic.c:87:2
    23099: 0xffffffff81342787    jmp    0xffffffff81342785        ; <+5> [inlined] rep_nop at processor.h:13:2
  vmlinux-5.12.0-0_fbk8_clang_6656_gc85768aa64da`panic_smp_self_stop + 5 [inlined] rep_nop at processor.h:13:2
    23098: 0xffffffff81342785    pause
  vmlinux-5.12.0-0_fbk8_clang_6656_gc85768aa64da`panic_smp_self_stop + 7 at panic.c:87:2
    23097: 0xffffffff81342787    jmp    0xffffffff81342785        ; <+5> [inlined] rep_nop at processor.h:13:2
  vmlinux-5.12.0-0_fbk8_clang_6656_gc85768aa64da`panic_smp_self_stop + 5 [inlined] rep_nop at processor.h:13:2
    23096: 0xffffffff81342785    pause
  vmlinux-5.12.0-0_fbk8_clang_6656_gc85768aa64da`panic_smp_self_stop + 7 at panic.c:87:2
    23095: 0xffffffff81342787    jmp    0xffffffff81342785        ; <+5> [inlined] rep_nop at processor.h:13:2
```

It used to be in an infinite loop where the decoder never stopped.

Besides that, the dump info command shows

```
(lldb) thread trace dump info 12

 Errors:
    Number of individual errors: 32
      Number of fatal errors: 1
      Number of other errors: 31
```

and in json format

```
(lldb) thread trace dump info 12 -j

 "errors": {
      "totalCount": 32,
      "libiptErrors": {},
      "fatalErrors": 1,
      "otherErrors": 31
    }
```

Differential Revision: https://reviews.llvm.org/D136557
2022-10-25 10:20:49 -07:00
Walter Erquinigo
f1e63855b0 [lldb][trace] Add a basic function call dump [3] - Add a JSON dumper
The JSON dumper is very minimalistic. It pretty much only shows the
delimiting instruction IDs of every segment, so that further queries to
the SBCursor can be used to make sense of the data. It's main purpose is
to be serialized somewhat cheaply.

I also renamed untracedSegment to untracedPrefixSegment, in case in the
future we add an untracedSuffixSegment. In any case, this new name is
more explicit, which I like.

Differential Revision: https://reviews.llvm.org/D136034
2022-10-18 13:57:53 -07:00
Walter Erquinigo
840d861d6e [lldb][trace] Add a basic function call dump [2] - Implement the reconstruction algorithm
This diff implements the reconstruction algorithm for the call tree and
add tests.

See TraceDumper.h for documentation and explanations.

One important detail is that the tree objects are in TraceDumper, even
though Trace.h is a better home. I'm leaving that as future work.

Another detail is that this code is as slow as dumping the entire
symolicated trace, which is not that bad tbh. The reason is that we use
symbols throughout the algorithm and we are not being careful about
memory and speed. This is also another area for future improvement.

Lastly, I made sure that incomplete traces work, i.e. you start tracing
very deep in the stack or failures randomly appear in the trace.

Differential Revision: https://reviews.llvm.org/D135917
2022-10-18 13:57:53 -07:00
Walter Erquinigo
566146c03b [lldb][trace] Add a basic function call dumpdump [1] - Add the command scaffolding
The command is thread trace dump function-calls and as minimum will
require printing to a file in json and non-json format

I added a test

Differential Revision: https://reviews.llvm.org/D135521
2022-10-18 13:57:52 -07:00
Walter Erquinigo
e17cae076c [trace][intel pt] Fix per-psb packet decoding
The per-PSB packet decoding logic was wrong because it was assuming that pt_insn_get_sync_offset was being udpated after every PSB. Silly me, that is not true. It returns the offset of the PSB packet after invoking pt_insn_sync_forward regardless of how many PSBs are visited later. Instead, I'm now following the approach described in https://github.com/intel/libipt/blob/master/doc/howto_libipt.md#parallel-decode for parallel decoding, which is basically what we need.

A nasty error that happened because of this is that when we had two PSBs (A and B), the following was happening

1. PSB A was processed all the way up to the end of the trace, which includes PSB B.
2. PSB B was then processed until the end of the trace.

The instructions emitted by step 2. were also emitted as part of step 1. so our trace had duplicated chunks. This problem becomes worse when you many PSBs.

As part of making sure this diff is correct, I added some other features that are very useful.

- Added a "synchronization point" event to the TraceCursor, so we can inspect when PSBs are emitted.
- Removed the single-thread decoder. Now the per-cpu decoder and single-thread decoder use the same code paths.
- Use the query decoder to fetch PSBs and timestamps. It turns out that the pt_insn_sync_forward of the instruction decoder can move past several PSBs (this means that we could skip some TSCs). On the other hand, the pt_query_sync_forward method doesn't skip PSBs, so we can get more accurate sync events and timing information.
- Turned LibiptDecoder into PSBBlockDecoder, which decodes single PSB blocks. It is the fundamental processing unit for decoding.
- Added many comments, asserts and improved error handling for clarity.
- Improved DecodeSystemWideTraceForThread so that a TSC is emitted always before a cpu change event. This was a bug that was annoying me before.
- SplitTraceInContinuousExecutions and FindLowestTSCInTrace are now using the query decoder, which can identify precisely each PSB along with their TSCs.
- Added an "only-events" option to the trace dumper to inspect only events.

I did extensive testing and I think we should have an in-house testing CI. The LLVM buildbots are not capable of supporting testing post-mortem traces of hundreds of megabytes. I'll leave that for later, but at least for now the current tests were able to catch most of the issues I encountered when doing this task.

A sample output of a program that I was single stepping is the following. You can see that only one PSB is emitted even though stepping happened!

```
thread #1: tid = 3578223
    0: (event) trace synchronization point [offset = 0x0xef0]
  a.out`main + 20 at main.cpp:29:20
    1: 0x0000000000402479    leaq   -0x1210(%rbp), %rax
    2: (event) software disabled tracing
    3: 0x0000000000402480    movq   %rax, %rdi
    4: (event) software disabled tracing
    5: (event) software disabled tracing
    6: 0x0000000000402483    callq  0x403bd4                  ; std::vector<int, std::allocator<int>>::vector at stl_vector.h:391:7
    7: (event) software disabled tracing
  a.out`std::vector<int, std::allocator<int>>::vector() at stl_vector.h:391:7
    8: 0x0000000000403bd4    pushq  %rbp
    9: (event) software disabled tracing
    10: 0x0000000000403bd5    movq   %rsp, %rbp
    11: (event) software disabled tracing
```

This is another trace of a long program with a few PSBs.
```
(lldb) thread trace dump instructions -E -f                                                                                                         thread #1: tid = 3603082
    0: (event) trace synchronization point [offset = 0x0x80]
    47417: (event) software disabled tracing
    129231: (event) trace synchronization point [offset = 0x0x800]
    146747: (event) software disabled tracing
    246076: (event) software disabled tracing
    259068: (event) trace synchronization point [offset = 0x0xf78]
    259276: (event) software disabled tracing
    259278: (event) software disabled tracing
    no more data
```

Differential Revision: https://reviews.llvm.org/D131630
2022-08-12 15:13:48 -07:00
Walter Erquinigo
6fb744be76 [trace][intel pt] Support a new kernel section in LLDB’s trace bundle schema
Add a new "kernel" section with following schema.

```
"kernel": {
  "loadAddress"?: decimal | hex string | string decimal
  # This is optional. If it's not specified, use default address 0xffffffff81000000.
  "file": string
  # path to the kernel image
}
```

Here's more details of the diff:
- If "kernel" section exist, it means current tracing mode is //KernelMode//.
- If tracing mode is //KernelMode//, the "processes" section must be empty and the "kernel" and "cpus" section must be provided. This is tested with `TestTraceLoad`.
- "kernel" section is parsed and turned into a new process with a single module which is the kernel image. The kernel process has N fake threads, one for each cpu.

Reviewed By: wallace

Differential Revision: https://reviews.llvm.org/D130805
2022-08-04 17:15:08 -07:00
Jakob Johnson
f9b4ea0ce9 [trace] Add SBTraceCursor bindings
Add bindings for the `TraceCursor` to allow for programatic traversal of
traces.
This diff adds bindings for all public `TraceCursor` methods except
`GetHwClock` and also adds `SBTrace::CreateNewCursor`. A new unittest
has been added to TestTraceLoad.py that uses the new `SBTraceCursor` API
to test that the sequential and random access APIs of the `TraceCursor`
are equivalent.

This diff depends on D130925.

Test Plan:
`ninja lldb-dotest && ./bin/lldb-dotest -p TestTraceLoad`

Differential Revision: https://reviews.llvm.org/D130930
2022-08-02 16:55:33 -07:00
Walter Erquinigo
4f676c2599 [trace][intel pt] Introduce wall clock time for each trace item
- Decouple TSCs from trace items
- Turn TSCs into events just like CPUs. The new name is HW clock tick, wich could be reused by other vendors.
- Add a GetWallTime that returns the wall time that the trace plug-in can infer for each trace item.
- For intel pt, we are doing the following interpolation: if an instruction takes less than 1 TSC, we use that duration, otherwise, we assume the instruction took 1 TSC. This helps us avoid having to handle context switches, changes to kernel, idle times, decoding errors, etc. We are just trying to show some approximation and not the real data. For the real data, TSCs are the way to go. Besides that, we are making sure that no two trace items will give the same interpolation value. Finally, we are using as time 0 the time at which tracing started.

Sample output:

```
(lldb) r
Process 750047 launched: '/home/wallace/a.out' (x86_64)
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
    frame #0: 0x0000000000402479 a.out`main at main.cpp:29:20
   26   };
   27
   28   int main() {
-> 29     std::vector<int> vvv;
   30     for (int i = 0; i < 100; i++)
   31       vvv.push_back(i);
   32
(lldb) process trace start -s 64kb -t --per-cpu
(lldb) b 60
Breakpoint 2: where = a.out`main + 1689 at main.cpp:60:23, address = 0x0000000000402afe
(lldb) c
Process 750047 resuming
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 2.1
    frame #0: 0x0000000000402afe a.out`main at main.cpp:60:23
   57     map<int, int> m;
   58     m[3] = 4;
   59
-> 60     map<string, string> m2;
   61     m2["5"] = "6";
   62
   63     std::vector<std::string> vs = {"2", "3"};
(lldb) thread trace dump instructions -t -f -e thread #1: tid = 750047
    0: [379567.000 ns] (event) HW clock tick [48599428476224707]
    1: [379569.000 ns] (event) CPU core changed [new CPU=2]
    2: [390487.000 ns] (event) HW clock tick [48599428476246495]
    3: [1602508.000 ns] (event) HW clock tick [48599428478664855]
    4: [1662745.000 ns] (event) HW clock tick [48599428478785046]
  libc.so.6`malloc
    5: [1662746.995 ns] 0x00007ffff7176660    endbr64
    6: [1662748.991 ns] 0x00007ffff7176664    movq   0x32387d(%rip), %rax      ;  + 408
    7: [1662750.986 ns] 0x00007ffff717666b    pushq  %r12
    8: [1662752.981 ns] 0x00007ffff717666d    pushq  %rbp
    9: [1662754.977 ns] 0x00007ffff717666e    pushq  %rbx
    10: [1662756.972 ns] 0x00007ffff717666f    movq   (%rax), %rax
    11: [1662758.967 ns] 0x00007ffff7176672    testq  %rax, %rax
    12: [1662760.963 ns] 0x00007ffff7176675    jne    0x9c7e0                   ; <+384>
    13: [1662762.958 ns] 0x00007ffff717667b    leaq   0x17(%rdi), %rax
    14: [1662764.953 ns] 0x00007ffff717667f    cmpq   $0x1f, %rax
    15: [1662766.949 ns] 0x00007ffff7176683    ja     0x9c730                   ; <+208>
    16: [1662768.944 ns] 0x00007ffff7176730    andq   $-0x10, %rax
    17: [1662770.939 ns] 0x00007ffff7176734    cmpq   $-0x41, %rax
    18: [1662772.935 ns] 0x00007ffff7176738    seta   %dl
    19: [1662774.930 ns] 0x00007ffff717673b    jmp    0x9c690                   ; <+48>
    20: [1662776.925 ns] 0x00007ffff7176690    cmpq   %rdi, %rax
    21: [1662778.921 ns] 0x00007ffff7176693    jb     0x9c7b0                   ; <+336>
    22: [1662780.916 ns] 0x00007ffff7176699    testb  %dl, %dl
    23: [1662782.911 ns] 0x00007ffff717669b    jne    0x9c7b0                   ; <+336>
    24: [1662784.906 ns] 0x00007ffff71766a1    movq   0x3236c0(%rip), %r12      ;  + 24
(lldb) thread trace dump instructions -t -f -e -J -c 4
[
  {
    "id": 0,
    "timestamp_ns": "379567.000000",
    "event": "HW clock tick",
    "hwClock": 48599428476224707
  },
  {
    "id": 1,
    "timestamp_ns": "379569.000000",
    "event": "CPU core changed",
    "cpuId": 2
  },
  {
    "id": 2,
    "timestamp_ns": "390487.000000",
    "event": "HW clock tick",
    "hwClock": 48599428476246495
  },
  {
    "id": 3,
    "timestamp_ns": "1602508.000000",
    "event": "HW clock tick",
    "hwClock": 48599428478664855
  },
  {
    "id": 4,
    "timestamp_ns": "1662745.000000",
    "event": "HW clock tick",
    "hwClock": 48599428478785046
  },
  {
    "id": 5,
    "timestamp_ns": "1662746.995324",
    "loadAddress": "0x7ffff7176660",
    "module": "libc.so.6",
    "symbol": "malloc",
    "mnemonic": "endbr64"
  },
  {
    "id": 6,
    "timestamp_ns": "1662748.990648",
    "loadAddress": "0x7ffff7176664",
    "module": "libc.so.6",
    "symbol": "malloc",
    "mnemonic": "movq"
  },
  {
    "id": 7,
    "timestamp_ns": "1662750.985972",
    "loadAddress": "0x7ffff717666b",
    "module": "libc.so.6",
    "symbol": "malloc",
    "mnemonic": "pushq"
  },
  {
    "id": 8,
    "timestamp_ns": "1662752.981296",
    "loadAddress": "0x7ffff717666d",
    "module": "libc.so.6",
    "symbol": "malloc",
    "mnemonic": "pushq"
  }
]
```

Differential Revision: https://reviews.llvm.org/D130054
2022-07-26 12:05:23 -07:00
ymeng
0466d1df23 [trace][intel pt] Support dumping the trace info in json
Thanks to ymeng@fb.com for coming up with this change.

`thread trace dump info` can dump some metrics that can be useful for
analyzing the performance and quality of a trace. This diff adds a --json
option for dumping this information in json format that can be easily
understood my machines.

Differential Revision: https://reviews.llvm.org/D129332
2022-07-13 12:26:11 -07:00
Walter Erquinigo
4a843d9282 [trace][intel pt] Create a CPU change event and expose it in the dumper
Thanks to fredzhou@fb.com for coming up with this feature.

When tracing in per-cpu mode, we have information of in which cpu we are execution each instruction, which comes from the context switch trace. This diff makes this information available as a `cpu changed event`, which an additional accessor in the cursor `GetCPU()`. As cpu changes are very infrequent, any consumer should listen to cpu change events instead of querying the actual cpu of a trace item. Once a cpu change event is seen, the consumer can invoke GetCPU() to get that information. Also, it's possible to invoke GetCPU() on an arbitrary instruction item, which will return the last cpu seen. However, this call is O(logn) and should be used sparingly.

Manually tested with a sample program that starts on cpu 52, then goes to 18, and then goes back to 52.

Differential Revision: https://reviews.llvm.org/D129340
2022-07-13 12:26:11 -07:00
Walter Erquinigo
b532dd545f [trace] Add an option to save a compact trace bundle
A trace bundle contains many trace files, and, in the case of intel pt, the
largest files are often the context switch traces because they are not
compressed by default. As a way to improve this, I'm adding a --compact option
to the `trace save` command that filters out unwanted processes from the
context switch traces. Eventually we can do the same for intel pt traces as
well.

Differential Revision: https://reviews.llvm.org/D129239
2022-07-13 11:43:28 -07:00
rnofenko
db73a52d7b [trace][intel pt] Add a nice parser for the trace size
Thanks to rnofenko@fb.com for coming up with these changes.

This diff adds support for passing units in the trace size inputs. For example,
it's now possible to specify 64KB as the trace size, instead of the
problematic 65536. This makes the user experience a bit friendlier.

Differential Revision: https://reviews.llvm.org/D129613
2022-07-13 10:53:14 -07:00
Walter Erquinigo
a7d6c3effe [trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.

- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.

All test pass

Differential Revision: https://reviews.llvm.org/D128576
2022-06-29 09:19:51 -07:00
Walter Erquinigo
b8dcd0ba26 [NFC][lldb][trace] Rename trace session to trace bundle
As previously discussed with @jj10306, we didn't really have a name for
the post-mortem (or offline) trace session representation, which is in
fact a folder with a bunch of files. We decided to call this folder
"trace bundle", and the main JSON file in it "trace bundle description
file". This naming is pretty decent, so I'm refactoring all the existing
code to account for that.

Differential Revision: https://reviews.llvm.org/D128484
2022-06-24 08:41:33 -07:00
Walter Erquinigo
efbfde0dd0 [trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.

- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable

sample output:

```
(lldb) thread trace dump instructions  -t -a --id 100 -J
[
  {
    "id": 100,
    "tsc": "43591204528448966"
    "loadAddress": "0x407a91",
    "module": "a.out",
    "symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
    "mnemonic": "movq",
    "source": "/usr/include/c++/8/bits/deque.tcc",
    "line": 492,
    "column": 30
  },
  ...
```

Differential Revision: https://reviews.llvm.org/D128316
2022-06-22 11:14:22 -07:00
Jakob Johnson
50f9367960 Add LoadTraceFromFile to SBDebugger and SBTrace
Add trace load functionality to SBDebugger via the `LoadTraceFromFile` method.
Update intelpt test case class to have `testTraceLoad` method so we can take advantage of
the testApiAndSB decorator to test both the CLI and SB without duplicating code.

Differential Revision: https://reviews.llvm.org/D128107
2022-06-20 11:54:47 -07:00
Dave Lee
4cc8f2a017 [lldb][tests] Automatically call compute_mydir (NFC)
Eliminate boilerplate of having each test manually assign to `mydir` by calling
`compute_mydir` in lldbtest.py.

Differential Revision: https://reviews.llvm.org/D128077
2022-06-17 14:34:49 -07:00
Walter Erquinigo
9f45f23d86 [trace][intelpt] Support system-wide tracing [21] - Support long numbers in JSON
llvm's JSON parser supports 64 bit integers, but other tools like the
ones written in JS don't support numbers that big, so we need to
represent these possibly big numbers as a string. This diff uses that to
represent addresses and tsc zero. The former is printed in hex for and
the latter in decimal string form. The schema was updated mentioning
that.

Besides that, I fixed some remaining issues and now all test pass. Before I wasn't running all tests because for some reason my computer reverted perf_paranoid to 1.

Differential Revision: https://reviews.llvm.org/D127819
2022-06-16 11:42:22 -07:00
Walter Erquinigo
6a5355e8a1 [trace][intelpt] Support system-wide tracing [20] - Rename some fields in the schema
As discusses offline with @jj10305, we are updating some naming used throughout the code, specially in the json schema

- traceBuffer -> iptTrace
- core -> cpu

Differential Revision: https://reviews.llvm.org/D127817
2022-06-16 11:42:22 -07:00
Walter Erquinigo
03cc58ff2a [trace][intelpt] Support system-wide tracing [17] - Some improvements
This improves several things and addresses comments up to the diff [11] in this stack.

- Simplify many functions to receive less parameters that they can identify easily
- Create Storage classes for Trace and TraceIntelPT that can make it easier to reason about what can change with live process refreshes and what cannot.
- Don't cache the perf zero conversion numbers in lldb-server to make sure we get the most up-to-date numbers.
- Move the thread identifaction from context switches to the bundle parser, to leave TraceIntelPT simpler. This also makes sure that the constructor of TraceIntelPT is invoked when the entire data has been checked to be correct.
- Normalize all bundle paths before the Processes, Threads and Modules are created, so that they can assume that all paths are correct and absolute
- Fix some issues in the tests. Now they all pass.
- return the specific instance when constructing PerThread and MultiCore processor tracers.
- Properly implement IntelPTMultiCoreTrace::TraceStart.
- Improve some comments.
- Use the typedef ContextSwitchTrace more often for clarity.
- Move CreateContextSwitchTracePerfEvent to Perf.h as a utility function.
- Synchronize better the state of the context switch and the intel pt
perf events.
- Use a booblean instead of an enum for the PerfEvent state.

Differential Revision: https://reviews.llvm.org/D127456
2022-06-16 11:23:02 -07:00
Walter Erquinigo
ff15efc1a7 [trace][intelpt] Support system-wide tracing [16] - Create threads automatically from context switch data in the post-mortem case
For some context, The context switch data contains information of which threads were
executed by each traced process, therefore it's not necessary to specify
them in the trace file.

So this diffs adds support for that automatic feature. Eventually we
could include it to live processes as well.

Differential Revision: https://reviews.llvm.org/D127001
2022-06-16 11:23:02 -07:00
Walter Erquinigo
ef9970759b [trace][intelpt] Support system-wide tracing [15] - Make triple optional
The process triple should only be needed when LLDB can't identify the correct
triple on its own. Examples could be universal mach-o binaries. But in any case,
at least for most of ELF files, LLDB should be able to do the job without having
the user specify the triple manually.

Differential Revision: https://reviews.llvm.org/D126990
2022-06-16 11:23:01 -07:00
Walter Erquinigo
a19fcc2bec [trace][intelpt] Support system-wide tracing [14] - Decode per cpu
This is the final functional patch to support intel pt decoding per cpu.
It works by doing the following:

- First, all context switches are split by tid and sorted in order. This produces a list of continuous executes per thread per core.
- Then, all intel pt subtraces are split by PSB boundaries and assigned to individual thread continuous executions on the same core by doing simple TSC-based comparisons.
- With this, we have, per thread, a sorted list of continuous executions each one with a list of intel pt subtraces. Up to this point, this is really fast because no instructions were actually decoded.
- Then, each thread can be decoded by traversing their continuous executions and intel pt subtraces. An advantage of having these continuous executions is that we can identify if a continuous exexecution doesn't have intel pt data, and thus has a gap in it. We can later to more sofisticated comparisons to identify if within a continuous execution there are gaps.

I'm adding a test as well.

Differential Revision: https://reviews.llvm.org/D126394
2022-06-16 11:23:01 -07:00
Walter Erquinigo
1a3f996972 [trace][intelpt] Support system-wide tracing [13] - Add context switch decoding
- Add the logic that parses all cpu context switch traces and produces blocks of continuous executions, which will be later used to assign intel pt subtraces to threads and to identify gaps. This logic can also identify if the context switch trace is malformed.
- The continuous executions blocks are able to indicate when there were some contention issues when producing the context switch trace. See the inline comments for more information.
- Update the 'dump info' command to show information and stats related to the multicore decoding flow, including timing about context switch decoding.
- Add the logic to conver nanoseconds to TSCs.
- Fix a bug when returning the context switches. Now they data returned makes sense and even empty traces can be returned from lldb-server.
- Finish the necessary bits for loading and saving a multi-core trace bundle from disk.
- Change some size_t to uint64_t for compatibility with 32 bit systems.

Tested by saving a trace session of a program that sleeps 100 times, it was able to produce the following 'dump info' text:

```
(lldb) trace load /tmp/trace3/trace.json                                                                   (lldb) thread trace dump info                                                                              Trace technology: intel-pt

thread #1: tid = 4192415
  Total number of instructions: 1

  Memory usage:
    Total approximate memory usage (excluding raw trace): 2.51 KiB
    Average memory usage per instruction (excluding raw trace): 2573.00 bytes

  Timing for this thread:

  Timing for global tasks:
    Context switch trace decoding: 0.00s

  Events:
    Number of instructions with events: 0
    Number of individual events: 0

  Multi-core decoding:
    Total number of continuous executions found: 2499
    Number of continuous executions for this thread: 102

  Errors:
    Number of TSC decoding errors: 0
```

Differential Revision: https://reviews.llvm.org/D126267
2022-06-16 11:23:01 -07:00
Walter Erquinigo
fc5ef57c7d [trace][intelpt] Support system-wide tracing [12] - Support multi-core trace load and save
:q!
This diff is massive, but it's because it connects the client with lldb-server
and also ensures that the postmortem case works.

- Flatten the postmortem trace schema. The reason is that the schema has become quite complex due to the new multicore case, which defeats the original purpose of having a schema that could work for every trace plug-in. At this point, it's better that each trace plug-in defines it's own full schema. This means that the only common field is "type".
-- Because of this new approach, I merged the "common" trace load and saving functionalities into the IntelPT one. This simplified the code quite a bit. If we eventually implement another trace plug-in, we can see then what we could reuse.
-- The new schema, which is flattened, has now better comments and is parsed better. A change I did was to disallow hex addresses, because they are a bit error prone. I'm asking now to print the address in decimal.
-- Renamed "intel" to "GenuineIntel" in the schema because that's what you see in /proc/cpuinfo.
- Implemented reading the context switch trace data buffer. I had to do
some refactors to do that cleanly.
-- A major change that I did here was to simplify the perf_event circular buffer reading logic. It was too complex. Maybe the original Intel author had something different in mind.
- Implemented all the necessary bits to read trace.json files with per-core data.
- Implemented all the necessary bits to save to disk per-core trace session.
- Added a test that ensures that parsing and saving to disk works.

Differential Revision: https://reviews.llvm.org/D126015
2022-06-15 13:28:36 -07:00
Walter Erquinigo
1f2d49a8e7 [trace][intelpt] Support system-wide tracing [10] - Return warnings and tsc information from lldb-server.
- Add a warnings field in the jLLDBGetState response, for warnings to be delivered to the client for troubleshooting. This removes the need to silently log lldb-server's llvm::Errors and not expose them easily to the user
- Simplify the tscPerfZeroConversion struct and schema. It used to extend a base abstract class, but I'm doubting that we'll ever add other conversion mechanisms because all modern kernels support perf zero. It is also the one who is supposed to work with the timestamps produced by the context switch trace, so expecting it is imperative.
- Force tsc collection for cpu tracing.
- Add a test checking that tscPerfZeroConversion is returned by the GetState request
- Add a pre-check for cpu tracing that makes sure that perf zero values are available.

Differential Revision: https://reviews.llvm.org/D125932
2022-06-15 12:08:00 -07:00
Walter Erquinigo
a758205951 [trace][intelpt] Support system-wide tracing [9] - Collect and return context switch traces
- Add collection of context switches per cpu grouped with the per-cpu intel pt traces.
- Move the state handling from the interl pt trace class to the PerfEvent one.
- Add support for stopping and enabling perf event groups.
- Return context switch entries as part of the jLLDBTraceGetState response.
- Move the triggers of whenever the process stopped or resumed. Now the will-resume notification is in a better location, which will ensure that we'll capture the instructions that will be executed.
- Remove IntelPTSingleBufferTraceUP. The unique pointer was useless.
- Add unit tests

Differential Revision: https://reviews.llvm.org/D125897
2022-06-15 12:07:59 -07:00
Walter Erquinigo
1f56f7fc16 [trace][intelpt] Support system-wide tracing [7] - Create a base IntelPTProcessTrace class
We have two different "process trace" implementations: per thread and per core. As a way to simplify the collector who uses both, I'm creating a base abstract class that is used by these implementations. This effectively simplify a good chunk of code.

Differential Revision: https://reviews.llvm.org/D125503
2022-06-15 12:07:59 -07:00
Walter Erquinigo
1f49714d3e [trace][intelpt] Support system-wide tracing [4] - Support per core tracing on lldb-server
This diffs implements per-core tracing on lldb-server. It also includes tests that ensure that tracing can be initiated from the client and that the jLLDBGetState ppacket returns the list of trace buffers per core.

This doesn't include any decoder changes.

Finally, this makes some little changes here and there improving the existing code.

A specific piece of code that can't reliably be tested is when tracing
per core fails due to permissions. In this case we add a
troubleshooting message and this is the manual test:

```
/proc/sys/kernel/perf_event_paranoid set to 1

(lldb) process trace start --per-core-tracing                                         error: perf event syscall failed: Permission denied
 You might need that /proc/sys/kernel/perf_event_paranoid has a value of 0 or -1.
``

Differential Revision: https://reviews.llvm.org/D124858
2022-05-17 12:46:54 -07:00
Walter Erquinigo
b8d1776fc5 [trace][intelpt] Support system-wide tracing [2] - Add a dummy --per-core-tracing option
This updates the documentation of the gdb-remote protocol, as well as the help messages, to include the new --per-core-tracing option.

Differential Revision: https://reviews.llvm.org/D124640
2022-05-09 16:05:26 -07:00
Walter Erquinigo
059f39d2f4 [trace][intel pt] Support events
A trace might contain events traced during the target's execution. For
example, a thread might be paused for some period of time due to context
switches or breakpoints, which actually force a context switch. Not only
that, a trace might be paused because the CPU decides to trace only a
specific part of the target, like the address filtering provided by
intel pt, which will cause pause events. Besides this case, other kinds
of events might exist.

This patch adds the method `TraceCursor::GetEvents()`` that returns the
list of events that happened right before the instruction being pointed
at by the cursor. Some refactors were done to make this change simpler.

Besides this new API, the instruction dumper now supports the -e flag
which shows pause events, like in the following example, where pauses
happened due to breakpoints.

```
thread #1: tid = 2717361
  a.out`main + 20 at main.cpp:27:20
    0: 0x00000000004023d9    leaq   -0x1200(%rbp), %rax
  [paused]
    1: 0x00000000004023e0    movq   %rax, %rdi
  [paused]
    2: 0x00000000004023e3    callq  0x403a62                  ; std::vector<int, std::allocator<int> >::vector at stl_vector.h:391:7
  a.out`std::vector<int, std::allocator<int> >::vector() at stl_vector.h:391:7
    3: 0x0000000000403a62    pushq  %rbp
    4: 0x0000000000403a63    movq   %rsp, %rbp
```

The `dump info` command has also been updated and now it shows the
number of instructions that have associated events.

Differential Revision: https://reviews.llvm.org/D123982
2022-04-25 19:01:23 -07:00
Walter Erquinigo
bdf3e7e5b8 [trace][intelpt] Add task timer classes
I'm adding two new classes that can be used to measure the duration of long
tasks as process and thread level, e.g. decoding, fetching data from
lldb-server, etc. In this first patch, I'm using it to measure the time it takes
to decode each thread, which is printed out with the `dump info` command. In a
later patch I'll start adding process-level tasks and I might move these
classes to the upper Trace level, instead of having them in the intel-pt
plugin. I might need to do that anyway in the future when we have to
measure HTR. For now, I want to keep the impact of this change minimal.

With it, I was able to generate the following info of a very big trace:

```
(lldb) thread trace dump info                                                                                                            Trace technology: intel-pt

thread #1: tid = 616081
  Total number of instructions: 9729366

  Memory usage:
    Raw trace size: 1024 KiB
    Total approximate memory usage (excluding raw trace): 123517.34 KiB
    Average memory usage per instruction (excluding raw trace): 13.00 bytes

  Timing:
    Decoding instructions: 1.62s

  Errors:
    Number of TSC decoding errors: 0
```

As seen above, it took 1.62 seconds to decode 9.7M instructions. This is great
news, as we don't need to do any optimization work in this area.

Differential Revision: https://reviews.llvm.org/D123357
2022-04-12 13:08:03 -07:00
Walter Erquinigo
05b4bf2571 [trace][intelpt] Introduce instruction Ids
In order to support quick arbitrary access to instructions in the trace, we need
each instruction to have an id. It could be an index or any other value that the
trace plugin defines.

This will be useful for reverse debugging or for creating callstacks, as each
frame will need an instruction id associated with them.

I've updated the `thread trace dump instructions` command accordingly. It now
prints the instruction id instead of relative offset. I've also added a new --id
argument that allows starting the dump from an arbitrary position.

Differential Revision: https://reviews.llvm.org/D122254
2022-04-06 12:19:36 -07:00
Alisamar Husain
d849959071 [lldb][intelpt] Remove IntelPTInstruction and move methods to DecodedThread
This is to reduce the size of the trace further and has appreciable results.

Differential Revision: https://reviews.llvm.org/D122991
2022-04-05 22:01:36 +05:30
Walter Erquinigo
1e5083a563 [trace][intel pt] Handle better tsc in the decoder
A problem that I introduced in the decoder is that I was considering TSC decoding
errors as actual instruction errors, which mean that the trace has a gap. This is
wrong because a TSC decoding error doesn't mean that there's a gap in the trace.
Instead, now I'm just counting how many of these errors happened and I'm using
the `dump info` command to check for this number.

Besides that, I refactored the decoder a little bit to make it simpler, more
readable, and to handle TSCs in a cleaner way.

Differential Revision: https://reviews.llvm.org/D122867
2022-04-02 11:06:26 -07:00
Alisamar Husain
ca922a3559 [intelpt] Refactor timestamps out of IntelPTInstruction
Storing timestamps (TSCs) in a more efficient map at the decoded thread level to speed up TSC lookup, as well as reduce the amount of memory used by each decoded instruction. Also introduced TSC range which keeps the current timestamp valid for all subsequent instructions until the next timestamp is emitted.

Differential Revision: https://reviews.llvm.org/D122603
2022-04-01 21:51:42 +05:30
Alisamar Husain
bcf1978a87 [intelpt] Refactoring instruction decoding for flexibility
Now the decoded thread has Append methods that provide more flexibility
in terms of the underlying data structure that represents the
instructions. In this case, we are able to represent the sporadic errors
as map and thus reduce the size of each instruction.

Differential Revision: https://reviews.llvm.org/D122293
2022-03-26 11:34:47 -07:00
Walter Erquinigo
a80c6c7d36 [trace] clear any existing tracing sessions before relaunching the binary
There's a bug caused when a process is relaunched: the target, which
doesn't change, keeps the Trace object from the previous process, which
is already defunct, and causes segmentation faults when it's attempted
to be used.
A fix is to clean up the Trace object when the target is disposing of
the previous process during relaunches.

A way to reproduce this:
```
lldb a.out
b main
r
process trace start
c
r
process trace start
```

Differential Revision: https://reviews.llvm.org/D122176
2022-03-21 16:03:37 -07:00
Alisamar Husain
ca47011e73 [tests][intelpt] Fix outdated trace load test
Differential Revision: https://reviews.llvm.org/D122114
2022-03-21 13:21:45 +05:30
Alisamar Husain
37a466dd72 [trace][intelpt] Added total memory usage by decoded trace
This fails currently but the basics are there

Differential Revision: https://reviews.llvm.org/D122093
2022-03-21 12:36:08 +05:30
Alisamar Husain
8271220a99 [trace][intelpt] Instruction count in trace info
Added a line to `thread trace dump info` results which shows total number of instructions executed until now.

Differential Revision: https://reviews.llvm.org/D122076
2022-03-20 11:28:16 +05:30
Walter Erquinigo
b7d525ad38 [trace][intelpt] fix some test failures
Minor fixes needed and now `./bin/lldb-dotest -p TestTrace` passes
correctly.

- There was an incorrect iteration.
- Some error messages changed.
- The way repeat commands are handled changed a bit, so I had to create
a new --continue arg in "thread trace dump instructions" to handle this
correctly.

Differential Revision: https://reviews.llvm.org/D122023
2022-03-18 10:35:34 -07:00
Walter Erquinigo
602497d672 [trace] [intel pt] Create a "process trace save" command
added new command "process trace save -d <directory>".
-it saves a JSON file as <directory>/trace.json, with the main properties of the trace session.
-it saves binary Intel-pt trace as <directory>/thread_id.trace; each file saves each thread.
-it saves modules to the directory <directory>/modules .
-it only works for live process and it only support Intel-pt right now.

Example:
```
b main
run
process trace start
n
process trace save -d /tmp/mytrace
```
A file named trace.json and xxx.trace should be generated in /tmp/mytrace. To load the trace that was just saved:
```
trace load /tmp/mytrace
thread trace dump instructions
```
You should see the instructions of the trace got printed.

To run a test:
```
cd ~/llvm-sand/build/Release/fbcode-x86_64/toolchain
ninja lldb-dotest
./bin/lldb-dotest -p TestTraceSave
```

Reviewed By: wallace

Differential Revision: https://reviews.llvm.org/D107669
2021-08-27 09:34:01 -07:00