pyspark_pandas

Pyspark + pandas. This may get merged into the SparklingPandas project.

Downloads
392.5K
Stars
6
Committers
1

There is already an existing project, SparklingPandas, that integrates pandas and pyspark. You should look at that one as this project may get merged into that one. This project aims to provide useful tools and algorithms for distributing Pandas objects on Spark.

  • Apache Spark is a fast and general engine for large-scale data processing. It is written in Scala and also supports Python via PySpark.

  • Pandas is a library providing high-performance and easy-to-use data structures and data analysis tools for the Python programming language.

Did you check out SparklingPandas?

Package Rankings
Top 14.47% on Pypi.org
Related Projects