Bot releases are hidden (Show)
unix-socket-backlog
example checking in CI (#417)pkg-config
to find libbpf
dependencies (#415)Published by bobrik 6 months ago
pprof
support (#364)errno
decoder (#378)padding
to label decoder to skip struct
holes (#376)ext4dist
example (#365)xfsdist
example (#368)biolatency
example (#373, #374, #375)sock-trace
and simplified example with socket cookies (#381)cachestat
into pre and post kernel 5.16 (#372)/usr/share/hwdata/pci.ids
for RedHat/Fedora/CentOS (#380)sd_notify
support when running under systemd (#382)Published by bobrik 8 months ago
This is a big release that comes with a major new feature: Distributed Tracing via OpenTelemetry (#297).
You can find the full documentation in ./tracing.
As a quick demo, you could run a demo locally with a provided Docker image:
all-in-one
to provide an OpenTelemetry sink and UI:docker run --rm -it --net host jaegertracing/all-in-one:1.54.0
Open Jaeger UI: http://localhost:16686/.
Build tracing demos from the root of the repo:
make tracing-demos
ebpf_exporter
with a sock-trace
example from the root of the repo:docker run --rm -it --privileged --net host -e OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 -v $(pwd)/tracing:/tracing ghcr.io/cloudflare/ebpf_exporter:v2.4.0 --config.dir=examples --config.names=sock-trace
./tracing/demos/sock/demo
Refresh the Jaeger UI and select demo
as the service, click "Find Traces".
Observe a trace that includes both userspace demo component produced spans and kernel spans produced with ebpf_exporter
:
We have more examples bundled, please see the docs.
Tracing support required us to take a few dependencies that needed a newer Go version, so we bumped the build requirement from go1.18 to go1.20.
Other changes:
softirq-latency-net-rx
example (#349)Published by bobrik 9 months ago
Published by bobrik 9 months ago
Published by bobrik 9 months ago
Published by bobrik 10 months ago
Highlights:
fanotify
for a faster and more reliable cgroup monitoring (#244, #263, #264, #265, #266, #279, #288)New examples:
icmp-ip
example with inet_ip
decoder (#251)pci_vendor
, pci_device
, pci_class
, pci_subclass
decoders with examples (#255, #274)kstack
decoder with an example (#313)unix-socket-backlog
example (#284)softirq-latency
example (#300, #304)softirq-latency-net-rx
example that's an array based version of softirq-latency
(#310)cfs-throttling
example (#311)tcp-retransmit
example (#318, #335)Changes to examples:
jsonschema
for examples and cleaned up unused keys (#314)exp2zero
histogram type for cases when 0 is a significant outcome and added tcp-syn-backlog-exp2zero
example (#280)uint
decoder for very large numbers (#296)tcp-syn-backlog-exp2zero
example (#301)increment_exp2zero_histogram
helper macro for examples (#302){increment_map,increment_{exp2,ex2zero}_histogram}_nosync
helper macros (#303, #305)tcp-syn-backlog
example with linear histogram (#306)shrinklat
example failure due to wrongly sized key (#319)shrinklat
example due to type mismatch (#327)biolatency
kernel version check after an upstream LTS backport (#309)Build changes:
build-dynamic
and build-static
make goals (#241)golangci-lint
and fixed uncovered issues (#256, #259, #269, #270)libbpf
version on startup to prevent runtime errors (#247)libbpf
instructions (#262)-race
if available (#267)dbhi/qus/action
to more official docker/setup-qemu-action
for CI builds (#272)ebpf_exporter
and ebpf_exporter_with_examples
(#273)libbpf
dependencies in CI (#275)Other changes:
linguist
ignores for vmlinux.h
files that were screwing language stats (#248).dockerignore
for libbpf
and built examples (#298)perf_event
from config definitions (#315)Published by bobrik about 1 year ago
The best release yet! Syscalls, per-cpu maps, running with no elevated capabilities at runtime — it has it all.
kfree_skb
example (#233, #234)tp_btf
in examples to remove the need for tracefs
(#227)--debug
is enabled (#216)--log.no-timestamps
(#239)clang-format
config to enforce formatting on C code (#222)vmlinux.h
from 5.15.0-25 to 6.3.0-7 and generation instructions (#224)Published by bobrik over 1 year ago
tcp-window-clamp
example (#172)CFLAGS
to examples (#172)HistogramBucketType
(#178)-mcpu
to v3
(#182)golangci-lint
to latest (#192)Published by bobrik almost 2 years ago
ebpf_exporter
v2 is here!
This release comes with a bunch of breaking changes (all for the better!), so be sure to read the release notes below.
First and foremost, we migrated from BCC to libbpf. BCC has served us well over the years, but it has a major drawback that it compiles eBPF programs at runtime, which requires a compiler, kernel headers and has a chance of failing due to kernel discrepancies between hosts and kernel versions. It was hard to do static linking with bcc, so we ended up providing a binary linked against an older libc, for which you had to provide your own libbcc (which could also break due to unstable ABI).
With libbpf
all these problems go away:
libbpf
and CO-RE
you can CO
mpile once and R
un E
verywhere, worrying less about runtime failures.libbpf
, so we now provide a statically compiled binary that you can use anywhere with no dependencies. We also have a Dockerfile
in the repo (not yet published on Docker Hub) if you're inclined to use that, and it's easier to run than ever.Big thanks to @wenlxie for doing a bulk of the work on porting to libbpf
in #130. Another big thanks to @aquasecurity for their work on libbpfgo
, which made it a lot easier for us to switch.
In BCC
repo itself there's an effort to migrate programs from BCC
to libbpf
and you can see it here:
The programs above can be used as an inspiration to what can ebpf_exporter
provide for you as a metric.
Now to config changes. Previously you needed to make one big yaml config with all your metric descriptions and metrics intermingled. Now each logical program is called a config (a .yaml
file) and each config has a dedicated eBPF ELF object (a .bpf.o
file compiled from a .bpf.c
file). When you start ebpf_exporter
, you need to give it the path to the directory with your configs and tell it which configs to load. This allowed us to greatly flatten and simplify the configs and it allows you to have a simpler tooling configuring what ebpf_exporter
should enable.
Having eBPF C code in separate files also allows you to use your regular tooling to build eBPF ELF objects. In examples
directory you'd find a collection of our example configs along with a Makefile
to build eBPF code. The expectation is that you would replicate something similar for your internal configs, and you all the needed bits and pieces provided for you to copy and adapt. We provide vmlinux.h
for both x86_64
(aka amd64) and aarch64
(aka arm64).
Having separate .bpf.o
allows you to compile not just C code, but anything that would provide a valid eBPF ELF object. We tried with Rust, but unsuccessfully. Please feel free to send a PR if you have better luck with it. We still expect that majority of the people would use plain old C, since that's what libbpf mainly supports and has a lot of examples for.
Since programs for configs need to compiled in advance, we compile them as a part of CI job, allowing to spot mistakes early.
You no longer need to describe how to attach your eBPF programs in the config, it all happens in code. Take timers
code as an example:
SEC("tracepoint/timer/timer_start")
int do_count(struct trace_event_raw_timer_start* ctx)
We use libbpf
provided SEC
macro to tell what to attach to, which in this case is timer:timer_start
tracepoint. You can use any SEC
that libbpf
provides (there are many) and it should work out of the box, including uprobe
, usdt
and fentry
(the latter currently requires a kernel patch on aarch64
).
We piggyback on libbpf
for most of the stuff with SEC
, with the only exception being perf_event
. For that we have a custom handler allowing you to set type
, config
, and frequency
of the event you want to trace. Below is type=HARDWARE
, config=PERF_COUNT_HW_CACHE_MISSES
at 1Hz from llcstat
example:
SEC("perf_event/type=0,config=3,frequency=1")
int on_cache_miss(struct bpf_perf_event_data *ctx)
With uprobe
support we also provide a way for you to run some code when you program is attached:
SEC("uprobe//proc/self/exe:post_attach_mark")
int do_init()
There's post_attach_mark()
function in ebpf_exporter
that runs immediately after all configs are attached. In bpf-jit
example we use it to initialize a metric that would otherwise require a probe to run, which might be a while.
We now allow loose program attachment. If previously all programs had to be attached successfully for ebpf_exporter
to run, now we allow failures and export a metric whether each program was attached or not. This way you can use alerting to detect when this happens, while not sacrificing unrelated configs. This is handy if your programs attach to something that might be missing from some kernels, like a static
function that is sometimes not visible. We used it in our cachestat
example.
Speaking of metrics, if you have kernel.bpf_stats_enabled
sysctl enabled, we now also report how many times each of your eBPF programs ran and how long it spent running, which might be handy if you want to get an idea of how long things take.
In code and for the debug endpoint we renamed "tables" to "maps" to match eBPF terminology. If you were using /tables
for debugging, you should switch to /maps
. Previously configs needed to specify which table
metrics came from, now it's automatically inferred from the metric name itself.
We have updated our benchmark, which now includes fentry
, so you can see how much faster it is than good old kprobe
and how much overhead you should expect in general (it's not much).
All of these changes are reflected in README
, so if you start from scratch, you shouldn't worry. If you are currently using ebpf_exporter
v1, it will take some work to upgrade. The good news is that the metrics you export do not need to change. Internally at Cloudflare we upgraded without any issues.
You may have noticed that previously ebpf_exporter
took some time to start up due to the need to compile programs. Since this is no longer the case, you should expect much faster startup times now. For complex configs like biolatency
you should also expect lower memory usage (we observed ~250MiB -> ~30MiB drop during the upgrade).
If you need some documents getting up to speed with libbpf
and CO-RE
, here are three great blog posts from libbpf
maintainer @anakryiko:
We hope you'll enjoy these changes. As usual, please let us know if you run into any issues.
Published by bobrik about 3 years ago
--version
work (#121)The binaries in this release require glibc 2.27 or newer. You need to have libbcc.so
installed to run the binaries, and on Ubuntu or Debian it's int the libbpfcc
package.
Published by bobrik about 3 years ago
gometalinter
to golangci-lint
(#87)/
endpoint gracefully (#88)mcevents
example (#93)libbcc
in CI (#97)gofmt
on whole project (#105)Published by bobrik almost 4 years ago
Type
member name for PerfEvent
struct (#73)Published by bobrik almost 5 years ago
Published by bobrik almost 5 years ago
Published by bobrik about 5 years ago
Published by bobrik almost 6 years ago
Published by bobrik about 6 years ago
First tagged release. See the slides from the presentation explaining how this works.