splink | Apache Spark Ecosystem Directory

Commit Statistics

Past Year

All Time

Total Commits

2,667

7,144

Total Committers

Avg. Commits Per Committer

65.05

96.54

Bot Commits

Issue Statistics

Past Year

All Time

Total Pull Requests

455

652

Merged Pull Requests

388

553

Total Issues

180

373

Time to Close Issues

about 1 month

3 months

Package Rankings

Top 31.53% on Conda-forge.org

Top 3.92% on Pypi.org

Badges

Extracted from project README

Related Projects

analytics-zoo

Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray

05 Mar 2024 9

spark-py-notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython /...

06 May 2015 1,614

aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

06 Jul 2017 133

data_lakehouse_local_stack

Data Lakehouse local stack with PySpark, Trino, and Minio. Includes an example to process Raygun ...

21 Jun 2024 0

spark-sql-flow-plugin

Visualize column-level data lineage in Spark SQL

14 Jun 2021 85

pixiedust

Python Helper library for Jupyter Notebooks

01 Jul 2016 1,037

data_science_blogs

A repository to keep track of all the code that I end up writing for my blog posts.

25 Dec 2019 252

visions

Type System for Data Analysis in Python

12 Dec 2019 205

Spark

There are Python 2.7 codes and learning notes for Spark 2.1.1

23 Feb 2017 24

Clustering4Ever

C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.

26 Mar 2018 130

fugue

A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars cod...

24 Mar 2020 1,901

flink

Apache Flink

07 Jun 2014 23,713

linkis

Apache Linkis builds a computation middleware layer to facilitate connection, governance and orch...

23 Jul 2019 3,246

SynapseML

Simple and Distributed Machine Learning

05 Jun 2017 4,986

deequ

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure...

07 Aug 2018 3,261