(educational) build your own disk based KV store
MIT License
CaskDB is a disk-based, embedded, persistent, key-value store based on the Riak's bitcask paper, written in Python. It is more focused on the educational capabilities than using it in production. The file format is platform, machine, and programming language independent. Say, the database file created from Python on macOS should be compatible with Rust on Windows.
This project aims to help anyone, even a beginner in databases, build a persistent database in a few hours. There are no external dependencies; only the Python standard library is enough.
If you are interested in writing the database yourself, head to the workshop section.
Most of the following limitations are of CaskDB. However, there are some due to design constraints by the Bitcask paper.
Consider joining the Discord community to build and learn KV Store with peers.
CaskDB does not require any external libraries to run. For local development, install the packages from requirements_dev.txt:
pip install -r requirements_dev.txt
PyPi is not used for CaskDB yet (issue #5), and you'd have to install it directly from the repository by cloning.
disk: DiskStorage = DiskStorage(file_name="books.db")
disk.set(key="othello", value="shakespeare")
author: str = disk.get("othello")
# it also supports dictionary style API too:
disk["hamlet"] = "shakespeare"
The workshop is for intermediate-advanced programmers. Knowing Python is not a requirement, and you can build the database in any language you wish.
Not sure where you stand? You are ready if you have done the following in any language:
NOTE: I don't have any workshops scheduled shortly. Follow me on Twitter for updates. Drop me an email if you wish to arrange a workshop for your team/company.
CaskDB comes with a full test suite and a wide range of tools to help you write a database quickly. A Github action is present with an automated tests runner, code formatter, linter, type checker and static analyser. Fork the repo, push the code, and pass the tests!
Throughout the workshop, you will implement the following:
start-here
branchtest_format.py
test_disk_store.py
Use make lint
to run mypy, black, and pytype static analyser. Run make test
to run the tests locally. Push the code to Github, and tests will run on different OS: ubuntu, mac, and windows.
Not sure how to proceed? Then check the hints file which contains more details on the tasks and hints.
I often get questions about what is next after the basic implementation. Here are some challenges (with different levels of difficulties)
This project was named cdb earlier and now renamed to CaskDB.
$ tokei -f format.py disk_store.py
===============================================================================
Language Files Lines Code Comments Blanks
===============================================================================
Python 2 391 261 103 27
-------------------------------------------------------------------------------
disk_store.py 204 120 70 14
format.py 187 141 33 13
===============================================================================
Total 2 391 261 103 27
===============================================================================
All contributions are welcome. Please check CONTRIBUTING.md for more details.
The MIT license. Please check LICENSE
for more details.