An indispensable performance timer for Python
MIT License
An indispensable performance timer for Python
Three general tools should be employed to understand the CPU performance of your Python code:
PerfTimer is a humble instance of #3. It's the easiest way (least amount of fuss and effort) to get insight into call count and execution time of a function or piece of code during a real session of your program.
Use cases include:
It seems everyone has tried their hand at writing one of these timer utilities. Implementations can be found in public repos, snippets, and PyPi— there's even a Python feature request. That's not counting all the proprietary and one-off instances.
Features of this library:
Typical usage is to create a PerfTimer
instance at the global
scope, so that aggregate execution time is reported at program termination:
from perf_timer import PerfTimer
_timer = PerfTimer('process thumbnail')
def get_thumbnail_image(path):
img = cache.get_thumbnail(path)
if not thumbnail:
img = read_image(path)
with _timer:
img.decode()
img.resize(THUMBNAIL_SIZE)
cache.set_thumbnail(img)
return img
When the program exits, assuming get_thumbnail_image
was called
several times, execution stats will be reported to stdout as
follows:
timer "process thumbnail": avg 73.1 µs ± 18.0 µs, max 320.5 µs in 292 runs
To instrument an entire function or class method, use PerfTimer
as a decorator:
@PerfTimer('get thumbnail')
def get_thumbnail_image(path):
...
By default PerfTimer
will track the average, standard deviation, and maximum
of observed values. Other available observers include HistogramObserver
,
which reports (customizable) percentiles:
import random
import time
from perf_timer import PerfTimer, HistogramObserver
_timer = PerfTimer('test', observer=HistogramObserver, quantiles=(.5, .9))
for _ in range(50):
with _timer:
time.sleep(random.expovariate(1/.1))
del _timer
output:
timer "test": avg 117ms ± 128ms, 50% ≤ 81.9ms, 90% ≤ 243ms in 50 runs
A custom logging function may be passed to the PerfTimer
constructor:
import logging
_logger = logging.getLogger()
_timer = PerfTimer('process thumbnail', log_fn=_logger.debug)
To minimize overhead, PerfTimer
assumes single-thread access. Use
ThreadPerfTimer
in multi-thread scenarios:
from perf_timer import ThreadPerfTimer
_timer = ThreadPerfTimer('process thumbnail')
In the previous example, timing the entire function will include file
I/O time since PerfTimer
measures wall time by default. For programs
which happen to do I/O via the Trio async/await library, you
can use TrioPerfTimer
which measures time only when the current task
is executing:
from perf_timer import TrioPerfTimer
@TrioPerfTimer('get thumbnail')
async def get_thumbnail_image(path):
img = cache.get_thumbnail(path)
if not thumbnail:
img = await read_image(path)
img.decode()
img.resize(THUMBNAIL_SIZE)
cache.set_thumbnail(img)
return img
(Open challenge: support other async/await libraries)
This module also provides the trio_perf_counter()
primitive.
Following the semantics of the various performance counters in Python's time
module, trio_perf_counter()
provides high resolution measurement of a Trio
task's execution time, excluding periods where it's sleeping or blocked on I/O.
(TrioPerfTimer
uses this internally.)
from perf_timer import trio_perf_counter
async def get_remote_object():
t0 = trio_perf_counter()
msg = await read_network_bytes()
obj = parse(msg)
print('task CPU usage (seconds):', trio_perf_counter() - t0)
return obj
pip install perf-timer
Measurement overhead is important. The smaller the timer's overhead, the less it interferes with the normal timing of your program, and the tighter the code loop it can be applied to.
The values below represent the typical overhead of one observation, as measured on ye old laptop (2014 MacBook Air 11 1.7GHz i7).
$ pip install -r test-requirements.txt
$ python benchmarks/overhead.py
compare observers:
PerfTimer(observer=AverageObserver): 1.5 µs
PerfTimer(observer=StdDevObserver): 1.8 µs (default)
PerfTimer(observer=HistogramObserver): 6.0 µs
compare types:
PerfTimer(observer=StdDevObserver): 1.8 µs
ThreadPerfTimer(observer=StdDevObserver): 9.8 µs
TrioPerfTimer(observer=StdDevObserver): 4.8 µs