Spark in Action, 2nd edition - chapter 1 - Introduction
This repository contains Scala and Python versions of the Java code used in Manning Publication’s Spark in Action, 2nd edition, by Jean-Georges Perrin.
Chapter 1 introduces the book and offers a basic example.
This code is designed to work with Apache Spark v3.1.2.
Each chapter has one or more labs. Labs are examples used for teaching in the book. You are encouraged to take ownership of the code and modify it, experiment with it, hence the use of the term lab. This chapter has only one lab.
The CsvToDataframeApp
application does the following:
SparkSession
).For information on running the Java lab, see chapter 1 in Spark in Action, 2nd edition.
Prerequisites:
You will need:
git
.git clone https://github.com/jgperrin/net.jgp.books.spark.ch01
cd net.jgp.books.spark.ch01/src/main/python/lab100_csv_to_dataframe/
spark-submit csvToDataframeApp.py
Prerequisites:
You will need:
git
.git clone https://github.com/jgperrin/net.jgp.books.spark.ch01
cd net.jgp.books.spark.ch01
Package application using sbt command
sbt clean assembly
spark-submit --class net.jgp.books.spark.ch01.lab100_csv_to_dataframe.CsvToDataframeScalaApp target/scala-2.11/SparkInAction2-Chapter01-assembly-1.0.0.jar
Follow me on Twitter to get updates about the book and Apache Spark: @jgperrin. Join the book's community on Facebook or in Manning's live site.