Expectations on Your Data
MIT License
datapact
- pytest, but for dataframesdatapact
is a Python library for verifying your data.
import datapact
dp = datapact.test(df)
dp.age.must.be_positive()
dp.name.should.not_be_empty()
It works with Pandas + Dask DataFrames, and has special support for Jupyter Notebooks.
Here's some features:
.should
for warnings, .must
for failures)Get Started here: https://datapact.dev
Datapact
TrackDatapact Track is an optional, browser-based data tracking service.
It's fully self-hostable via Docker and Postgres, and there's a hosted version available at track.datapact.dev
.
Connecting your test suite is one line of code:
dp.connect(
server="track.datapact.dev",
token="..." # get this from the UI
)
Datapact track gives you:
Try out Datapact Track at track.datapact.dev, or follow the self-hosting guide to deploy your own instance.
datapact
vs Great Expectations
Both datapact and Great Expectations help you improve Data Quality, but with a different approach.
Great Expectations has its own JSON-based storage format for expectation suites, and it gives you a custom UI to edit them. It's way bigger than datapact - in project size, project scope, but also in complexity.
datapact
is a lot younger, community-run, and more of a library than a framework.
The main differentiator is that it allows you to express your test suites in Python code, right along your other code.
This works in Python Scripts, Jupyter Notebooks, Pipeline Tests - everywhere that Python runs.
And by having your tests in code, you can co-locate them with the rest of your code, and version control + review them just like all of it.
If you already know how to use Great Expectations, you should use it.
If you found its learning curve to steep, maybe look at datapact
- it's designed to be easy to get started, and intuitive to use.
Thanks goes to these wonderful people (emoji key):
This project follows the all-contributors specification. Contributions of any kind welcome!