[!WARNING] I've paused development on LSIO for now. I've shifted my focus to
hypergrib
. In its current state, LSIO is a very minimal proof-of-concept that io_uring is faster than object_store when reading many small chunks of files from local PCIe 5 SSDs on Linux. There is no Python API yet.
The ultimate ambition is to enable folks to efficiently load and process large, multi-dimensional datasets as fast as modern CPUs & I/O subsystems will allow.
For now, this repo is just a place for me to tinker with ideas.
Under the hood, light-speed-io
uses io_uring
on Linux for local files, and will use object_store
for all other data I/O.
My first use-case for light-speed-io is to help to speed up reading Zarr. After that, I'm interested in helping to create fast readers for "native" geospatial file formats like GRIB2 and EUMETSAT native files. And, even further than that, I'm interested in efficient & fast computation on out-of-core, chunked, labelled, multi-dimensional data.
See planned_design.md
for more info on the planned design. And please see this blogpost for my motivations for wanting to help speed up Zarr.
(This will almost certainly change!)
The list below is in (rough) chronological order. This roadmap is also represented in the GitHub milestones for this project, when sorted alphabetically.
io_uring
prototypeio_uring
prototype using Rayon
to loop through io_uring completion queueio_uring
async/await implementation with object_store
-like APITokio
with Rayon
O_DIRECT
object_store
using criterion
io_uring
Box::into_raw
for tracking in-flight operationsRayon
for the IO threadpoolStream
.lsio_aligned_bytes
: Shareable buffer which can be aligned to arbitrary boundaries at runtimelsio_threadpool
: Work-stealing threadpool (based on crossbeam-deque
)lsio_io
: Traits for all LSIO IO backendslsio_uring
IO backend (for loading data from a local SSD) with user-defined number of worker threadslsio_uring
backend
lsio_object_store_bridge
IO backend
lsio_uring
vs lsio_object_store_bridge
lsio_zarr
lsio_zarr
vs zarr-python v3
(from Python)lsio_uring
lsio_object_store_bridge
lsio_zarr
io_uring
(Although maybe this won't be necessary because dynamical.org are converting datasets to Zarr)
.nat
files to 10-bit per channel bit-packed Zarr). If there's no computation to be done on the data during conversion then do all the copying with io_uring
: open source file -> read chunks from source -> write to destination -> etc.Light Speed IO is organised as a Cargo workspace with multiple (small) crates. The crates are organised in a flat crate structure. The flat crate structure is used by projects such as Ruff, Polars, and rust-analyser.
LSIO crate names use snake_case, following in the footsteps of the Rust Book and Ruff. (The choice of snake_case versus hyphens is, as far as I can tell, entirely arbitrary: Polars and rust-analyser both use hyphens. I just prefer the look of underscores!)