Horizon chart for CPU/GPU/Neural Engine utilization monitoring on Apple M1/M2 and nVidia GPUs on Linux
MIT License
cubestat is a command-line utility to monitor system metrics in horizon chart format. It was originally created for Apple M1/M2 devices, but supports Linux with NVIDIA GPU as well, including Google Colab environment. Numerous tools exist for tracking system metrics, yet horizon charts stand out due to their good information density which enables the display of many time-series data on a single screen.
Let's start with an example:
https://github.com/okuvshynov/cubestat/assets/661042/8e1e405e-ca61-4ffb-bedb-e04eb33f8bc2
In the clip above we see Mixtral-8x7b inference on MacBook Air with FF layers offloaded to SSD. We can notice somewhat low GPU util, 2Gb/s+ of data read from disk, as we have to fetch the weights, but plenty of free RAM (And we are actually able to serve almost 100Gb model on 24Gb machine with fp16 precision, even if very slow).
We can also clearly see moment of change from model loading (cpu util, disk writes for model preprocessing) to model inference (disk reads, gpu util going up, cpu going down)
Currently cubestat reports:
man powermetrics
it is an estimate, but seems working good enough as a proxy to ANE utilization. Is shown as percentage.Known limitations:
powermetrics
with sudo. You don't need to run cubestat itself with sudo, but you'll be asked sudo password when cubestat launches powermetrics. If you are comfortable doing that, you can add powermetrics
to /etc/sudoers
(your_user_name ALL=(ALL) NOPASSWD: /usr/bin/powermetrics
) and avoid this.% pip install cubestat
or
% pip install cubestat[cuda] # for instances with NVIDIA
usage: cubestat [-h] [--refresh_ms REFRESH_MS] [--buffer_size BUFFER_SIZE] [--view {off,one,all}] [--cpu {all,by_cluster,by_core}] [--gpu {collapsed,load_only,load_and_vram}] [--swap {show,hide}] [--network {show,hide}] [--disk {show,hide}]
[--power {combined,all,off}]
options:
-h, --help show this help message and exit
--refresh_ms REFRESH_MS, -i REFRESH_MS
Update frequency, milliseconds
--buffer_size BUFFER_SIZE
How many datapoints to store. Having it larger than screen width is a good idea as terminal window can be resized
--view {off,one,all} legend/values/time mode. Can be toggled by pressing v.
--cpu {all,by_cluster,by_core}
CPU mode - showing all cores, only cumulative by cluster or both. Can be toggled by pressing c.
--gpu {collapsed,load_only,load_and_vram}
GPU mode - hidden, showing all GPUs load, or showing load and vram usage. Can be toggled by pressing g.
--swap {show,hide} Show swap . Can be toggled by pressing s.
--network {show,hide}
Show network io. Can be toggled by pressing n.
--disk {show,hide} Show disk read/write. Can be toggled by pressing d.
--power {combined,all,off}
Power mode - off, showing breakdown CPU/GPU/ANE load, or showing combined usage. Can be toggled by pressing p.
Interactive commands:
https://github.com/okuvshynov/cubestat/assets/661042/c5e0750d-9bbd-4636-a1ea-71cc75ebbadb
We see a workload with uneven distribution between 4 GPUs installed. By pressing 'g' we can toggle the view mode to either show aggregate load, per GPU load or per GPU load and VRAM usage.
A few notes on 'what does this even represent?'. Utilization we show is essentially current power consumption reported by powermetrics. To convert it to % we divide it by some maximum value observed in experimentation. When reading this metric, be aware:
We can run cubestat on Google Colab instances to monitor GPU/CPU/IO usage.
First cell:
!pip install cubestat[cuda]
!pip install colab-xterm
%load_ext colabxterm
# export TERM=xterm-256color <---- RUN THIS IN TERMINAL
# cubestat <---- RUN THIS IN TERMINAL
Start xterm:
%xterm
In the terminal, configure 256 colors and start cubestat:
# export TERM=xterm-256color
# cubestat
Example notebook: colab example