Implementing best practices for PySpark ETL jobs and applications.
macOS development environment setup: Easy-to-understand instructions with automated setup script...
A library developed to ease the data ETL development process.
Apache Lucene.NET
The project aims to automate content classification and knowledge retrieval, as well as to perfor...
Jupyter magics and kernels for working with remote Spark clusters
Spark examples
sbt plugin for spark-submit
A Python package to submit and manage Apache Spark applications on Kubernetes.
REST job server for Apache Spark
Sparglim✨ makes PySpark App Configurable and Deploy Spark Connect Server Easier!
A boilerplate for spark projects with docker support for local development and scripts for emr su...
Apache Spark - A unified analytics engine for large-scale data processing
A free tutorial for Apache Spark.
Service for extracting tables from the CCAO system-of-record and uploading them to the Data Depar...