Course outline for Spark
About Spark
Pre-requisites for learning Spark
Lab Setup
- Hardware Configuration
- A minimum of 20GB of disk space and at least
- Ensure that all participants have a properly functioning Internet connection
- Software Configuration
- Ubuntu 20.04/22.04 Desktop/Server edition
Duration
- 2-5 days
Training Mode
Online training for Spark
We provide:
- Instructor led live training
- Self-paced learning with access to expert coaches
- 24x7 access to cloud labs with end to end working examples
All jnaapti sessions are 100% hands-on. All our instructors are engineers by heart. Activities are derived from real-life problems faced by our expert faculty. Self-paced hands-on sessions are delivered via Virtual Coach.
Classroom training for Spark
Classroom sessions are conducted in client locations in:
- Bengaluru
- Chennai
- Hyderabad
- Mumbai
- Delhi/Gurgaon/NCR
Note: Classroom training is for corporate clients only
Detailed Course Outline for Spark
Big Data
- Dealing with web-scale data
- How big is big data?
- Where is the data?
Map/Reduce Algorithms
- Map-Only
- Sorting
- Inverted Indexes
- Counting and Summing
- Filtering
- Trying out a few examples
Map/Reduce
- The traditional parallel programming paradigm
- Issues with traditional parallel programming
- Introduction to Map/Reduce
- Thinking in Map/Reduce
Introduction to Spark
- What is Spark?
- Differences from Pig and Hive
- Installing Spark
- Where to use Spark
- Linking with Spark
- Using the Shell
Resilient Distributed Datasets (RDDs)
- Parallelized Collections
- External Datasets
- RDD Operations
- Basics
- Passing functions to Spark
- Closures
- Working with Key-Value Pairs
- Transformations
- Actions
- Shuffle operations
RDD Persistence
- Which Storage Level to Choose?
- Removing Data
Shared Variables
- Broadcast Variables
- Accumulators
Deploying to a Cluster
- Launching Application with spark-submit
- Advanced Dependency Management
Launching Spark jobs from Java/Scala
- Spark Launcher
- Spark App handle
Using Spark with Different Languages
- Scala
- Java
- Python
- R
PySpark and Jupyter
- Introduction to PySpark
- Using Jupyter notebook
- Using pySpark shell
- PySpark in Jupyter