HandySpark - bringing pandas-like capabilities to Spark dataframes
MIT License
Visualizer for pandas data structures
Data conversions and examples for generating reports from twarc collections using tools such as D...
sidetable builds simple but useful summary tables of your data
This is a guide to PySpark code style presenting common situations and the associated best practi...
Default risk prediction for Home Credit competition - Fast, scalable and maintainable SQL-based f...
A grammar for data manipulation in Python
Productivity Utilities for Data Science with Python Notebooks
Commonly Consumed Code Commodities
Find data quality issues and clean your data in a single line of code with a Scikit-Learn compati...
Missing data visualization module for Python.
A small timeseries transformation API built on Flask and Pandas
A collection of code snippets from the publication Daily Dose of Data Science on Substack: http:/...
A Python package to make publication-ready but customizable coefficient plots.
Partial result caching for pandas in Python.
Type System for Data Analysis in Python