Course outline for Hadoop, HBase, Pig, Hive

Hadoop, HBase, Pig, Hive Course Objectives

The course is targeted at application developers who are currently evaluating the use of Hadoop in their projects or are intending to work with data at scale The course covers some aspects of Hadoop administration too (gives an overview of Hadoop cluster setups)

Pre-requisites for learning Hadoop, HBase, Pig, Hive

  • Good knowledge of Java and Eclipse
  • Must be comfortable in a Linux environment
  • Knowledge of web scale data challenges is preferred

Lab Setup

Hardware pre-requisites

  • A minimum of 20GB of disk space and atleast 4GB of RAM
  • Ensure that all participants have a properly functioning Internet connection

Software pre-requisites

  • Ubuntu 20.04/22.04 Desktop Edition.
  • JDK 1.8
  • Eclipse Oxygen Java or JEE Edition

Duration

5 days

Training Mode

Online training for Hadoop, HBase, Pig, Hive

We provide:

  • Instructor led live training
  • Self-paced learning with access to expert coaches
  • 24x7 access to cloud labs with end to end working examples

All jnaapti sessions are 100% hands-on. All our instructors are engineers by heart. Activities are derived from real-life problems faced by our expert faculty. Self-paced hands-on sessions are delivered via Virtual Coach.

Classroom training for Hadoop, HBase, Pig, Hive

Classroom sessions are conducted in client locations in:

  • Bengaluru
  • Chennai
  • Hyderabad
  • Mumbai
  • Delhi/Gurgaon/NCR

Note: Classroom training is for corporate clients only

Detailed Course Outline for Hadoop, HBase, Pig, Hive

Overview

  • Dealing with web-scale data
  • How big is big data?
  • Where is the data?
  • A few use-cases for Hadoop
  • When to and when not to use Hadoop

Introduction to the Hadoop Ecosystem

  • Hadoop architectural overview
  • Companies and products related to Hadoop
  • Hadoop Sub-projects
  • Downloading and installing Hadoop

Hadoop "Hello World"

  • Setting up a "one-node" cluster
  • Running a simple example
  • Using Eclipse to develop Hadoop programs
  • Executing the provided examples

Beyond a Single System

  • Setting up a 2-node cluster
  • Setting up multi-node clusters

HDFS

  • What is a distributed filesystem?
  • Introduction to Hadoop DFS
  • HDFS Concepts - Blocks, Namenodes and Datanodes
  • Configuring HDFS
  • Interacting with HDFS
  • Using the HDFS web interface

Map/Reduce

  • The traditional parallel programming paradigm
  • Issues with traditional parallel programming
  • Introduction to Map/Reduce
  • Thinking in Map/Reduce

Map/Reduce Algorithms

  • Map-Only
  • Sorting
  • Inverted Indexes
  • Counting and Summing
  • Filtering
  • Trying out a few examples

Using Hadoop Commands

  • Commands overview
  • User commands
  • Admin commands

Hadoop best practices and use cases

  • A look at some of the high-level usecases and how Hadoop/Map-Reduce can be used to solve them
  • Twitter Stream Analysis

Hadoop Cluster Setup

  • SSH Configuration
  • Hadoop Configuration
  • Verifying the cluster setup
  • Using Hadoop via Amazon EMR

Pig

  • Why Pig?
  • Installing Pig
  • Running Pig locally
  • Running Pig on a Hadoop Cluster
  • The Pig Console (grunt)
  • The Pig Data Model
  • Pig Latin - Input and Output, Relational Operations and User Defined Functions

Hive

  • What is Hive
  • Installation
  • Configuration
  • Data Definition Language – Tables, Views, Indexes
  • Data Manipulation Language
  • Handling Joins in Hive

HBase

  • Installation of HBase
  • Running HBase
  • MapReduce integration
  • Understanding the HBase architecture
  • Cluster setup