A Pyspark companion for data science tasks.
GPL-3.0 License
Type System for Data Analysis in Python
A library for building structured LLM responses with Spark
A commandline tool for analysis of big biological data sets for distributed HPC clusters.
PySpark-Tutorial provides basic algorithms using PySpark
Apache Spark - A unified analytics engine for large-scale data processing
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython /...
PySpark + Scikit-learn = Sparkit-learn
Simple and Distributed Machine Learning
Apache Spark Machine Learning project using MLlib and Linear Regression on Databricks!
A collection of utilities for handling pySpark's SparkContext
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
A repository to keep track of all the code that I end up writing for my blog posts.
Distributed scikit-learn meta-estimators in PySpark
Spark extension for processing large-scale 3D data sets: Astrophysics, High Energy Physics, Meteo...
FITS data source for Spark SQL and DataFrames