A Data Engineering & Machine Learning Knowledge Hub
EXPLAIN
Solution Accelerators for Serverless Spark on GCP, the industry's first auto-scaling and serverle...
ไธๆณจๅคงๆฐๆฎๅญฆไน ้ข่ฏ๏ผๅคงๆฐๆฎๆ็ฅไน่ทฏๅผๅฏใFlink/Spark/Hadoop/Hbase/Hive...
๐ Awesome list for Data Lake
A repository to keep track of all the code that I end up writing for my blog posts.
Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
A minimal benchmark for scalability, speed and accuracy of commonly used open source implementati...
Data Lakehouse local stack with PySpark, Trino, and Minio. Includes an example to process Raygun ...
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars cod...
Simple and Distributed Machine Learning
SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark...
Dataproc templates and pipelines for solving simple in-cloud data tasks
More than 2000+ Data engineer interview questions.
Master's Final Degree Project on Artificial Intelligence and Big Data
Visualize column-level data lineage in Spark SQL
Code Library for My Blog