chicken-gun

stressing your system, a chicken at a time

MIT License

Stars
9

chicken-gun

A chicken gun is a large-diameter, compressed-air cannon used to fire dead chickens at aircraft components in order to simulate high-speed bird strikes during the aircraft's flight. (source: Wikipedia)

Here you can find cg, a tool aimed at providing very targetted load at specific parts of a machine to verify:

  • what happens when specific problematic scenarios occur, and
  • if we're properly collecting telemetry from our systems.

Table of contents

Scenarios

cpu

Exercises the CPU time spent on userspace code by creating n threads that each keep running a busy loop indefinitely.

# run four threads with busyloops in them.
cg cpu --threads 4

Once the scenario runs, we can look at CPU utilization metrics to verify that we're really exercising the CPUs, but first, let's see where we can gather that info from:

cat /proc/stat
cpu  15336 204 1036 1949794 774 0 133 0 0 0  #  -- aggregate over all cpus
cpu0 5135  42  370  649932  248 0 21  0 0 0
cpu1 5106  162 315  649920  275 0 102 0 0 0
#     |    |   |    |       |   |  |  | | |
#     |    |   |    |       |   |  |  | | *guest_nice
#     |    |   |    |       |   |  |  | *guest
#     |    |   |    |       |   |  |  *steal
#     |    |   |    |       |   |  *softirq
#     |    |   |    |       |   *irq
#     |    |   |    |       *iowait
#     |    |   |    *idle
#     |    |   *system
#     |    *nice
#     *user    

Where each number measures the number of jiffies (100HZ on x86) that the cpu saw itself in that mode since the time that the system booted.

metric description
user normal processes executing in user mode
nice niced processes executing in user mode
system processes executing in kernel mode
idle idle
iowait time during which a particular CPU was idle and there was at least one outstanding disk I/O operation requested by a task scheduled on that CPU (at the time it generated that I/O request)

References:

context-switches

In this scenario, threads get their execution swapped in n cores all the time, constantly.

As a result, we end up with:

  • not much userspace CPU consumption,
  • very high per-task context switch numbers, and
  • high kernel-space CPU utilization for migration/* processes.

For instance, looking at the results of sampling a mostly idle system that only has cg context-switches running for 30s:

# take samples of the whole callgraph 99 times a second for every
# cpu in the machine while running the `sleep` command.
#
#    -F,--freq        Profile at this frequency.
#
#    -a,--all-cpus    System-wide collection from all CPUs 
#                     (default if no target is specified).
#
#    -g               Enables call-graph (stack chain/backtrace) recording.
#
perf record --freq 99 -a -g sleep 30


# `perf-script` reads perf.data (created by perf record) and displays 
# trace output.
#
# With the traces generated by `perf script`, `stackcollapse` then 
# collapses that multiline output of samples into semicolon-separated single
# lines, appropriate for `flamegraph.pl` to consume.
#
# From those collapsed stack traces, `flamegraph.pl` generates the
# `svg` with the flamegraph visualization.
perf script | \
	stackcollapse-perf.pl | \
	flamegraph.pl --hash --width=1000 > \
	context-switches-flamegraph.svg

Now, looking at the number of context switches as reported by procfs, we can see how aggressive we are in terms of context switching:

cd /proc/$(cat /tmp/cg.pid)/tasks
find . -name "status" | xargs -n1 grep 'ctxt'
voluntary_ctxt_switches:	4
nonvoluntary_ctxt_switches:	1
voluntary_ctxt_switches:	214
nonvoluntary_ctxt_switches:	1590249
voluntary_ctxt_switches:	232
nonvoluntary_ctxt_switches:	1590307
voluntary_ctxt_switches:	240
nonvoluntary_ctxt_switches:	1590386
voluntary_ctxt_switches:	242
nonvoluntary_ctxt_switches:	1590412

If we're even more curious and want to know in which CPUs the threads were when they ran, we can then look at a tailored output of perf script:

# filtering the system-wide samples, look at only those
# for the `cg` command, then output the corresponding `cpu`
# where each `tid` ran.
perf script --fields comm,cpu,tid | awk '/cg/{print $2 $3}'

Something interesting that happens when exercising context switches is that we can't just see the overhead associated with them by looking only at user and system CPU utilization, despite the fact that a 1 - idle reveals that our CPUs are busy with such activity.

pids

Creates n different processes under the same process group as the parent cg initiated by cg pids.

cg pids -n 5

# check the process group
pstree -p $(cat /tmp/cg.pid)
cg(2016)exe(2017)
         exe(2018)
         exe(2019)
         exe(2020)
         exe(2021)

Under the hood, cg pids creates child processes from its own image (/proc/self/exe), specifying the hidden cg sleep - one that just sleeps forever - as their command.

This has the effect of having several processes (not just threads) under the same process group as cg.

Despite the fact that Linux does not provide us with a single file containing the exact number of processes created, we can rely on what getdents(2) on /proc returns:

ls /proc/ | awk '/^[0-9]+$/'

files-open

By creating n files under a particular directory and keeping them open, this scenario can be used to verify either that, for instance, per-process limits are really enforced.

For example:

# check out what the current limit for the current process is
cat /proc/$$/limits
Limit                Soft Limit     Hard Limit    Units
...
Max resident set     unlimited      unlimited     bytes
Max open files       1024           1048576       files
Max locked memory    16777216       16777216      bytes
...

# configure the current process to have a limit
# of 20 open files
ulimit -n 20


# verify that we indeed changes the limit for the current process
cat /proc/$$/limits
Limit                Soft Limit     Hard Limit    Units
...
Max resident set     unlimited      unlimited     bytes
Max open files       20             20            files
Max locked memory    16777216       16777216      bytes
...

# see that we can't go past that limit:
cg files-open -d /tmp -n 30
thread 'main' panicked at 'failed to create /tmp/17: Too many open files (os error 24)', src/fs.rs:18:25

In order to check what the current number of open files we have, we can inspect the process' /proc/$pid/fd:

# create and open a number of open files that we're allowed to handle
cg files-open -d /tmp -n 10
ls /proc/$(cat /tmp/cg.pid)/fd | wc -l
13	# < 10 files + stdin, stdout, and stderr.

tcp-transmitter and tcp-receiver

Respectively, sends/receives bytes from/to files as quickly as possible using as few userspace time as possible (leverages splice heavily).

# in one terminal
cg tcp-receiver -a 127.0.0.1:1337

# in another terminal
cg tcp-transmitter -a 127.0.0.1:1337

# in yet another terminal
sar -n DEV 1
21:23:14        IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s   %ifutil
21:23:15       enp0s3      4.00      4.00      0.23      0.40      0.00      0.00      0.00      0.00
21:23:15           lo 240386.00 240386.00 5263101.63 5263101.63      0.00      0.00      0.00      0.00
21:23:15       enp0s8      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
21:23:15      docker0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00

top
...
%Cpu(s):  1.5 us, 31.5 sy,  0.0 ni, 60.0 id,  0.0 wa, ...
	 *------* *-----*

In a container

Just like in a regular bare-metal or virtual machine, cg can run in containerized environments too.

A container image can be found on DockerHub: cirocosta/chicken-gun.

docker run cirocosta/chicken-gun cpu --threads 4

LICENSE

MIT - See ./LICENSE.