Course outline for Spark
TechnologySparkDuration
* 2-5 days
LevelPrerequisites listed
About Spark
Pre-requisites
Lab Setup
- Hardware Configuration
- A minimum of 20GB of disk space and at least
- Ensure that all participants have a properly functioning Internet connection
- Software Configuration
- Ubuntu 20.04/22.04 Desktop/Server edition
How we train
Online training for Spark
- Instructor-led live cohorts
- Self-paced learning with expert coaches
- 24x7 cloud labs with end-to-end examples
All sessions are 100% hands-on. Labs and activities are derived from real-world work our engineers deliver.
Classroom training
Available for corporate teams in:
- Bengaluru
- Chennai
- Hyderabad
- Mumbai
- Delhi/Gurgaon/NCR
- Pune
Note: Classroom training is for corporate clients only.
Self-paced hands-on sessions are delivered via VirtualCoach.
Detailed Course Outline
Hands-onBig Data
- Dealing with web-scale data
- How big is big data?
- Where is the data?
Map/Reduce Algorithms
- Map-Only
- Sorting
- Inverted Indexes
- Counting and Summing
- Filtering
- Trying out a few examples
Map/Reduce
- The traditional parallel programming paradigm
- Issues with traditional parallel programming
- Introduction to Map/Reduce
- Thinking in Map/Reduce
Introduction to Spark
- What is Spark?
- Differences from Pig and Hive
- Installing Spark
- Where to use Spark
- Linking with Spark
- Using the Shell
Resilient Distributed Datasets (RDDs)
- Parallelized Collections
- External Datasets
- RDD Operations
- Basics
- Passing functions to Spark
- Closures
- Working with Key-Value Pairs
- Transformations
- Actions
- Shuffle operations
RDD Persistence
- Which Storage Level to Choose?
- Removing Data
Shared Variables
- Broadcast Variables
- Accumulators
Deploying to a Cluster
- Launching Application with spark-submit
- Advanced Dependency Management
Launching Spark jobs from Java/Scala
- Spark Launcher
- Spark App handle
Using Spark with Different Languages
- Scala
- Java
- Python
- R
PySpark and Jupyter
- Introduction to PySpark
- Using Jupyter notebook
- Using pySpark shell
- PySpark in Jupyter