Course outline for Hadoop, HBase, Pig, Hive
Hadoop, HBase, Pig, Hive Course Objectives
The course is targeted at application developers who are currently evaluating the use of Hadoop in their projects or are intending to work with data at scale The course covers some aspects of Hadoop administration too (gives an overview of Hadoop cluster setups)
Pre-requisites for learning Hadoop, HBase, Pig, Hive
- Good knowledge of Java and Eclipse
- Must be comfortable in a Linux environment
- Knowledge of web scale data challenges is preferred
Lab Setup
Hardware pre-requisites
- A minimum of 20GB of disk space and atleast 4GB of RAM
- Ensure that all participants have a properly functioning Internet connection
Software pre-requisites
- Ubuntu 20.04/22.04 Desktop Edition.
- JDK 1.8
- Eclipse Oxygen Java or JEE Edition
Duration
5 days
Training Mode
Online training for Hadoop, HBase, Pig, Hive
We provide:
- Instructor led live training
- Self-paced learning with access to expert coaches
- 24x7 access to cloud labs with end to end working examples
All jnaapti sessions are 100% hands-on. All our instructors are engineers by heart. Activities are derived from real-life problems faced by our expert faculty. Self-paced hands-on sessions are delivered via Virtual Coach.
Classroom training for Hadoop, HBase, Pig, Hive
Classroom sessions are conducted in client locations in:
- Bengaluru
- Chennai
- Hyderabad
- Mumbai
- Delhi/Gurgaon/NCR
Note: Classroom training is for corporate clients only
Detailed Course Outline for Hadoop, HBase, Pig, Hive
Overview
- Dealing with web-scale data
- How big is big data?
- Where is the data?
- A few use-cases for Hadoop
- When to and when not to use Hadoop
Introduction to the Hadoop Ecosystem
- Hadoop architectural overview
- Companies and products related to Hadoop
- Hadoop Sub-projects
- Downloading and installing Hadoop
Hadoop "Hello World"
- Setting up a "one-node" cluster
- Running a simple example
- Using Eclipse to develop Hadoop programs
- Executing the provided examples
Beyond a Single System
- Setting up a 2-node cluster
- Setting up multi-node clusters
HDFS
- What is a distributed filesystem?
- Introduction to Hadoop DFS
- HDFS Concepts - Blocks, Namenodes and Datanodes
- Configuring HDFS
- Interacting with HDFS
- Using the HDFS web interface
Map/Reduce
- The traditional parallel programming paradigm
- Issues with traditional parallel programming
- Introduction to Map/Reduce
- Thinking in Map/Reduce
Map/Reduce Algorithms
- Map-Only
- Sorting
- Inverted Indexes
- Counting and Summing
- Filtering
- Trying out a few examples
Using Hadoop Commands
- Commands overview
- User commands
- Admin commands
Hadoop best practices and use cases
- A look at some of the high-level usecases and how Hadoop/Map-Reduce can be used to solve them
- Twitter Stream Analysis
Hadoop Cluster Setup
- SSH Configuration
- Hadoop Configuration
- Verifying the cluster setup
- Using Hadoop via Amazon EMR
Pig
- Why Pig?
- Installing Pig
- Running Pig locally
- Running Pig on a Hadoop Cluster
- The Pig Console (grunt)
- The Pig Data Model
- Pig Latin - Input and Output, Relational Operations and User Defined Functions
Hive
- What is Hive
- Installation
- Configuration
- Data Definition Language – Tables, Views, Indexes
- Data Manipulation Language
- Handling Joins in Hive
HBase
- Installation of HBase
- Running HBase
- MapReduce integration
- Understanding the HBase architecture
- Cluster setup