There are Python 2.7 codes and learning notes for Spark 2.1.1
GPL-3.0 License
Simple and Distributed Machine Learning
C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.
scala、spark使用过程中,各种测试用例以及相关资料整理
Apache Spark - A unified analytics engine for large-scale data processing
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Ka...
An open-source toolkit for analyzing line-oriented JSON Twitter archives with Apache Spark.
A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
A repository to keep track of all the code that I end up writing for my blog posts.
PySpark-Tutorial provides basic algorithms using PySpark
A library for building structured LLM responses with Spark
Spark extension for processing large-scale 3D data sets: Astrophysics, High Energy Physics, Meteo...
A recommender system for discovering GitHub repos, built with Apache Spark
spark ml 算法原理剖析以及具体的源码实现分析
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython /...
PySpark + Scikit-learn = Sparkit-learn