retsnoop - v0.10 Latest Release

Published by github-actions[bot] about 2 months ago

Major changes

This is a big release with some new major features added (though we still stay within minor version update, as there might be still some minor breaking changes). Most notable changes:

Function arguments capture (-A argument). Retsnoop now can capture all input arguments for all traced functions and print them in human-readable form. See README for more details.
Injected probes (-J). In addition to traced functions specified with -e and -a flags, it's now possible to also specify a single-point injected probes (kprobes, kretprobes, tracepoints, and raw tracepoints). Note that for kprobe, it's possible to specify extra offset (e.g., -J kprobe:bprm_execve+12), which allows to trace inlined functions and internals of functions (normally retsnoop only traces function entry and exit). See README for more details.
Retsnoop can also capture extra context for injected probes, just use -A and -J together. For kprobes and kretprobes registers state is captured, for tracepoints and raw tracepoints their actual arguments are captured. See README for more details.
(Breaking change!) It's now possible to enable only function call trace mode (-T) separately from default call stack mode. The latter now is controlled with -E flag. The important distinction and a breaking change is that with function call trace mode --success-stacks/-S option is implied, which makes most sense for function call tracing. When retsnoop -E is specified, even with -T, the original behavior of tracing and emitting only erroring call stacks (i.e., those that end up returning error from entry functions specified with -e arguments). So, in short:
- retsnoop -T emits all function call trace, both successful and erroring;
- retsnoop -E (or just retsnoop, as -E is the default mode) emits only erroring call stacks (no function call traces);
- retsnoop -E -S will emit call stacks only (no function call traces), but both erroring and successful ones;
- retsnoop -E -T will emit both call stacks and function call traces, but only erroring ones;
- retsnoop -E -T -S will do both call stacks and function call trace for both successful and erroring cases.
Added kernel module BTF support, improving tracing functions defined in kernel modules.
Added advanced configuration options, specified with -C flag. See retsnoop --config-help for list of supported options and more details.
Significant rework of --help output.
Many smaller bug fixes and usability improvements.

What's Changed

Fix typos in README by @Antiz96 in https://github.com/anakryiko/retsnoop/pull/57
Makefile: use LDFLAGS for linking by @martinetd in https://github.com/anakryiko/retsnoop/pull/58
Makefile: Update variables for package builds with external artifacts by @martinetd in https://github.com/anakryiko/retsnoop/pull/56
Makefile: Do not rebuild the sidecar if not the default path by @martinetd in https://github.com/anakryiko/retsnoop/pull/59
Support module BTF and lots of log improvements by @anakryiko in https://github.com/anakryiko/retsnoop/pull/60
Lbr improvements by @anakryiko in https://github.com/anakryiko/retsnoop/pull/61
Retsnoop session revamp by @anakryiko in https://github.com/anakryiko/retsnoop/pull/62
Retsnoop function args capture support by @anakryiko in https://github.com/anakryiko/retsnoop/pull/63
Retsnoop config and function args capture polish by @anakryiko in https://github.com/anakryiko/retsnoop/pull/64
Bump gimli stack by @michel-slm in https://github.com/anakryiko/retsnoop/pull/66
Retsnoop vararg support in printf-like functions by @anakryiko in https://github.com/anakryiko/retsnoop/pull/68
retsnoop: make func call trace and call stack modes independent by @anakryiko in https://github.com/anakryiko/retsnoop/pull/69
Retsnoop injection probes and other improvements by @anakryiko in https://github.com/anakryiko/retsnoop/pull/70
retsnoop: handle idle threads properly by @anakryiko in https://github.com/anakryiko/retsnoop/pull/71
Retsnoop improvements for LBR, stitched stacks, and interim stacks by @anakryiko in https://github.com/anakryiko/retsnoop/pull/72
Retsnoop ARM64 improvements by @anakryiko in https://github.com/anakryiko/retsnoop/pull/75

New Contributors

@Antiz96 made their first contribution in https://github.com/anakryiko/retsnoop/pull/57

Full Changelog: https://github.com/anakryiko/retsnoop/compare/v0.9.8...v0.10

retsnoop - v0.9.8

Published by github-actions[bot] 9 months ago

What's Changed

A few small fixes and clean ups by @anakryiko in https://github.com/anakryiko/retsnoop/pull/53
bpftool: Pass additional compile flags as EXTRA_CFLAGS, not CFLAGS by @qmonnet in https://github.com/anakryiko/retsnoop/pull/52
sidecar: Update addr2line dependency to 0.21 by @danielocfb in https://github.com/anakryiko/retsnoop/pull/55

New Contributors

@danielocfb made their first contribution in https://github.com/anakryiko/retsnoop/pull/55

Full Changelog: https://github.com/anakryiko/retsnoop/compare/v0.9.7...v0.9.8

retsnoop - retsnoop v0.9.7

Published by anakryiko about 1 year ago

What's Changed

Add release workflow for shipping binaries and combined sources by @qmonnet in https://github.com/anakryiko/retsnoop/pull/42 and https://github.com/anakryiko/retsnoop/pull/44
retsnoop: Fix sign extension failure logic by @anakryiko in https://github.com/anakryiko/retsnoop/pull/50
retsnoop: Remove a limit of 4096 attachable functions by @anakryiko in https://github.com/anakryiko/retsnoop/pull/48
retsnoop: Remove hard-coded maximum of 256 CPUs supported by @anakryiko in https://github.com/anakryiko/retsnoop/pull/49
retsnoop: Use bpf_probe_read_kernel() instead of bpf_probe_read() by @iii-i in https://github.com/anakryiko/retsnoop/pull/46
retsnoop: Pass the real envp to the sidecar by @erthalion in https://github.com/anakryiko/retsnoop/pull/47

New Contributors

@qmonnet made their first contribution in https://github.com/anakryiko/retsnoop/pull/42
@iii-i made their first contribution in https://github.com/anakryiko/retsnoop/pull/46
@erthalion made their first contribution in https://github.com/anakryiko/retsnoop/pull/47

Full Changelog: https://github.com/anakryiko/retsnoop/compare/v0.9.6...v0.9.7

retsnoop - retsnoop v0.9.6

Published by anakryiko over 1 year ago

What's Changed

Fix calibration unreliability on some new kernels by @anakryiko in https://github.com/anakryiko/retsnoop/pull/41

Full Changelog: https://github.com/anakryiko/retsnoop/compare/v0.9.5...v0.9.6

retsnoop - retsnoop v0.9.5

Published by anakryiko over 1 year ago

What's Changed

Massive improvements in how retsnoop determines whether kprobes are attachable:

add --debug multi-kprobe mode to bisect failing multi-kprobe attachment; it quickly narrows down and logs which kprobes were attempted but failed to be attached;
skip attaching to kernel functions that have non-unique name and some of instances are not traceable;
resolve internal mix up of function and data ksyms;
internal fixes to consistently take into account kernel module to which ksym/kprobe belongs to.

Overall, these fixes and improvements make retsnoop's mass-attach behavior more reliable.

Full Changelog: https://github.com/anakryiko/retsnoop/compare/v0.9.4...v0.9.5

retsnoop - retsnoop v0.9.4

Published by anakryiko over 1 year ago

Bug fixes

fix IP (instruction pointer) fetching on non-x86_64 architectures on older kernels;
handle io_uring source code files better, after Linux code reorganization;
automatically pick debugfs (/sys/kernel/debug/tracing) or tracefs (/sys/kernel/tracing), whichever is available;
handle very old kernels that don't support BPF global data more gracefully.

retsnoop - retsnoop v0.9.3

Published by anakryiko almost 2 years ago

What's Changed

retsnoop now supports DWARF-based symbolization (i.e.,
source code file/line info and inline functions) on
KASLR-enabled Linux kernels.

retsnoop - retsnoop v0.9.2

Published by anakryiko about 2 years ago

What's Changed

-F (fentry/fexit) mode now supports tracing void-returning functions;
few fixes for -F (fentry/fexit) mode interacting weirdly with source code globs;
retsnoop now can be compiled across x86_64, i686, aarch64, ppc64le, s390x, riscv64 architectures by using per-architecture pre-generated minimal vmlinux.h (see gen-vmlinux-headers.sh script);
retsnoop now builds bootstrap (lightweight) version of bpftool from submodule, which allows it to be compilable on multiple-architectures. Previously retsnoop's Makefile relied on checked in pre-built x86_64 bpftool binary;
massive revamp of README.md;
usage text fixes and improvements.

Full Changelog: https://github.com/anakryiko/retsnoop/compare/v0.9.1...v0.9.2

retsnoop - retsnoop v0.9.1

Published by anakryiko about 2 years ago

Few nice improvements with no major new features:

dropped the requirement for /proc/config.gz presence for multi-kprobe detection (just using BPF CO-RE now for detection);
use dynamically allocated internal formatting buffers for stacks and traces, thus allowing much larger traces without dropping any information (at the expense of more memory usage, of course);
force-flush stdout before (potentially very long) detachment to improve retsnoop usage in scripts;
emit detected features when printing version and --verbose flag is specified:

$ sudo ./retsnoop -Vv
retsnoop v0.9.1
Feature detection:
        BPF ringbuf map supported: yes
        bpf_get_func_ip() supported: yes
        bpf_get_branch_snapshot() supported: yes
        BPF cookie supported: yes
        multi-attach kprobe supported: yes
Feature calibration:
        kretprobe IP offset: 4
        fexit sleep fix: yes
        fentry re-entry protection: yes

All just nice quality of life improvements. Enjoy!

retsnoop - retsnoop v0.9

Published by anakryiko about 2 years ago

--trace (-T) function calls trace mode

Add function call trace output, in addition to default stack trace and LBR output.

Example:

$ sudo ./retsnoop -e '*sys_bpf' -v -n simfail -a ':kernel/bpf/syscall.c' -a ':kernel/bpf/verifier.c' -T
...
Receiving data...
15:50:12.413878 -> 15:50:12.414193 TID/PID 1755152/1755152 (simfail/simfail):

FUNCTION CALLS TRACE                    RESULT                 DURATION
-------------------------------------   --------------------  ---------
→ __x64_sys_bpf
    → __sys_bpf
        ↔ bpf_check_uarg_tail_zero      [0]                     0.341us
        → bpf_raw_tracepoint_open
            ↔ __bpf_prog_get            [0xffffc9000c93d000]    0.255us
            → bpf_tracing_prog_attach
                ↔ bpf_link_prime        [0]                     2.530us
                ↔ bpf_link_cleanup      [void]                  3.435us
            ← bpf_tracing_prog_attach   [-ENOTSUPP]           306.161us
        ← bpf_raw_tracepoint_open       [-ENOTSUPP]           310.147us
    ← __sys_bpf                         [-ENOTSUPP]           314.846us
← __x64_sys_bpf                         [-ENOTSUPP]           315.515us

                      entry_SYSCALL_64_after_hwframe+0x44  (arch/x86/entry/entry_64.S:112:0)
                      do_syscall_64+0x2d                   (arch/x86/entry/common.c:46:12)
   315us [-ENOTSUPP]  __x64_sys_bpf+0x1c                   (kernel/bpf/syscall.c:4749:1)
   314us [-ENOTSUPP]  __sys_bpf+0x867                      (kernel/bpf/syscall.c:4689:9)
   310us [-ENOTSUPP]  bpf_raw_tracepoint_open+0x9a         (kernel/bpf/syscall.c:3063:6)
!  306us [-ENOTSUPP]  bpf_tracing_prog_attach

As you can see from the above, function calls trace mode allows to peer into exact control flow inside the kernel, but filter to according to allow/deny lists, taking into account all the filters (process name, latency, etc). In addition to call sequence, function results and duration is emitted.

Note that leaf function calls (e.g., bpf_link_prime above) are collapsed, if they don't call any other functions. This makes call trace more readable and compact. This is marked with ↔ marker, while otherwise function entry is marked with →, and function exit is marked with ←.

This mode is perfectly augments stack trace output for deeper kernel internals inspection, but also is great for discovering how kernel internals work, in general.

retsnoop - retsnoop v0.8.3

Published by anakryiko over 2 years ago

Add ability to filter functions by kernel modules.

General glob format is now <name-glob> [<module-glob>], where module glob is optional. This allows to, e.g., attach to all kprobes within some module: '* [fuse]' will attach to all the functions within fuse module. Note that module glob is also a glob, so one can capture multiple modules within one glob, e.g. -a '* [kvm*]' will capture functions defined in kvm and kvm_intel modules.

retsnoop - retsnoop v0.8.2

Published by anakryiko over 2 years ago

Few more usability improvements:

allow tracing functions from kernel modules;
default LBR mode to any_return, instead of mode detailed any;
improve error and return value output logic to take into account pointer vs integer vs error, if possible;
slight clean up of LBR output.

retsnoop - retsnoop v0.8.1

Published by anakryiko over 2 years ago

Few usability improvements:

improved glob support, '?' is now supported, as well as there could be multiple '*' anywhere in the pattern;
fixed multi-kprobe kernel support detection (take into account CONFIG_FPROBE);
filter out __ftrace_invalid_address___xxx fake kprobe entries;
don't filter out valid bpf_prog_xxx kernel functions from stack traces.

retsnoop - retsnoop v0.8

Published by anakryiko over 2 years ago

Few pretty big usability improvements.

More flexible and compact stack trace formatting. Retsnoop is now trying to determine the minimal correct size of each output column so as to keep the stack trace alignment but also using minimal amount of horizontal space.
This same logic is reused for formatting LBR stacks, which also allows to have from and to branches emitted "horizontally", instead of one after the other as before. This improves the comprehension significantly.
Retsnoop now recognizes symbol LBR flags aliases, allowing much easier tuning of what kind of LBR data to capture. E.g., --lbr=any_return will capture only returns from functions, allowing to see further into unknown sequence of kernel function calls. This is very useful when trying to discover what's going on without knowing particular area of the kernel you are trying to debug. By default retsnoop is effectively using --lbr=any.
--lbr-max-count N was added to limit number of last useful LBR records. It's not always necessary to see all 32 of them, last 5 or some might be more than enough.

With all the above changes, here's an example of one captured error with LBR stack traces included. Retsnoop is run as:

$ sudo ./retsnoop -e '*sys_bpf' -a ':kernel/bpf/syscall.c' -n simfail --lbr=any_return --lbr-max-count=5

Failure is simulated with simfail:

$ sudo ./simfail bpf-bad-map-lookup-value

And here's the result:

09:24:54.846 PID 336615 (simfail):
                    entry_SYSCALL_64_after_hwframe+0x44  (arch/x86/entry/entry_64.S:112:0)
                    do_syscall_64+0x2d                   (arch/x86/entry/common.c:46:12)
    34us [-ENOENT]  __x64_sys_bpf+0x1c                   (kernel/bpf/syscall.c:4749:1)
    27us [-ENOENT]  __sys_bpf+0x1a42                     (kernel/bpf/syscall.c:4632:9)
                    . map_lookup_elem                    (kernel/bpf/syscall.c:1113:5)
!    7us [-ENOENT]  bpf_map_copy_value

[#07] migrate_disable+0x3c       (kernel/sched/core.c:1755:1)      ->  bpf_map_copy_value+0x31       (kernel/bpf/syscall.c:241:2)
[#07]                                                                  . bpf_disable_instrumentation (include/linux/bpf.h:1453:2)

[#06] array_map_lookup_elem+0x24 (kernel/bpf/arraymap.c:168:1)     ->  bpf_map_copy_value+0x1ed      (kernel/bpf/syscall.c:269:10)

[#05] rcu_read_unlock_strict+0x5 (kernel/rcu/tree_plugin.h:797:1)  ->  bpf_map_copy_value+0x18c      (include/linux/rcupdate.h:724:2)

[#04] migrate_enable+0x59        (kernel/sched/core.c:1783:1)      ->  bpf_map_copy_value+0x9e       (kernel/bpf/syscall.c:288:2)
[#04]                                                                  . maybe_wait_bpf_programs     (kernel/bpf/syscall.c:170:49)

[#03] bpf_map_copy_value+0xba    (kernel/bpf/syscall.c:291:1)      ->  __kretprobe_trampoline+0x0

retsnoop - retsnoop v0.7

Published by anakryiko over 2 years ago

Two major features:

Extremely fast multi-kprobe is used if kernel supports it (automatically, need 5.18+ kernel). This speeds up attachment and especially detachment time immensely. There is no way to understate this. It's seconds and potentially minutes (if attaching to a lot of functions) against a couple milliseconds with multi-kprobe.
Error filter support. Use -x ENOMEM to report stacks that return -ENOMEM. Use -X ENOMEM to skip stacks that report -ENOMEM. NULL is an error, so -x NULL and -X NULL is also supported. You can combine multiple -x and -X options together. -X takes precedence (i.e., if some error is disabled, enabling it with -x won't help).

retsnoop - retsnoop v0.6

Published by anakryiko almost 3 years ago

Lots of quality of life improvements:

Ability to specify functions by their source code locations.Use the following syntax in -e, -a and -d: :fs/btrfs/*.c'.
Default to safer kprobe mode by default. Can be overriden with -F argument.
Symbolization with line info and inline functions is now on by default, no more need to specify -ss. If vmlinux image can't be located, fall backs to -s none (-sn), meaning no extra symbolization beyond using /proc/kallsyms.
Dry run mode added (--dry-run) which will do everything but load and attach BPF programs. Very useful to figure out what retsnoop will try to trace without risking affecting the system.
-V (--version) now prints retsnoop version.

retsnoop - retsnoop v0.5.1

Published by anakryiko almost 3 years ago

Fixes potential issues with LBR perf event by using hardware event. No other changes compared to v0.5.

retsnoop - retsnoop v0.5

Published by anakryiko about 3 years ago

A huge milestone for retsnoop: LBR capturing!

When kernel supports capturing LBR entries from BPF kprobe/fexit function,
it will capture such LBR records and emit relevant them after the captured stack trace.
This allows to trace back inside the last failed/traced function, including logic inside
the inlined functions. This allows to see where exactly inside potentially large function
the error happened. Use --lbr flag to enable this feature. If kernel doesn't support
this feature, retsnoop will report this with a warning, visible in verbose mode (-v).

Relevant kernel feature was added by Song Liu in
Linux kernel commit 856c02dbce4f ("bpf: Introduce helper bpf_get_branch_snapshot").

retsnoop - retsnoop v0.4.1-alpha

Published by anakryiko about 3 years ago

Force line-oriented output in stdout.

retsnoop - retsnoop v0.4-alpha

Published by anakryiko over 3 years ago