BIG DATA- HADOOP Course Content
Introduction
BIG DATA- HADOOP (Development & Basic Administration)
What You Will Get From This Course?
- In-depth understanding of Entire Big Data Hadoop and Hadoop Eco System
- Real time idea of Hadoop Development
- Detailed Course Materials
- Free Core Java and UNIX Fundamentals
- Interview Oriented Discussions
- Get Ready for Hadoop & Spark Developer (CCA175) Certification Exam
Overall Course Structure:
- UNIX/LINUX Basic Commands
- Basic UNIX Shell Scripting
- Basic Java Programming – Core JAVA OOPS Concepts
- Introduction to Big Data and Hadoop
- Working With HDFS
- Hadoop Map Reduce Concepts & Features
- Developing Map Reduce Applications
- Hadoop Eco System Components:
- HIVE
- PIG
- HBASE
- FLUME
- SQOOP
- OOZIE
- Introduction to SPARK & SCALA
- Real Time Tools like Putty, WinSCP, Eclipse, Hue, Cloudera Manager
Pre-Requisite:
- Basic SQL Knowledge
- Computer with Minimum 4GB RAM (8GM RAM Preferred)
- Basic UNIX & Java Programming knowledge is added advantage
Detailed Course Structure:
Introduction to Big Data & Hadoop
- The Big Data Problem
- What is Big Data?
- Challenges in processing Big Data
- What is Hadoop?
- Why Hadoop?
- History of Hadoop
- Hadoop Components Overview
- HDFS
- Map Reduce
- Hadoop Eco System Introduction
- NoSQL Database Introduction
Understanding Hadoop Architecture
- Hadoop 2.x Architecture
- Introduction to YARN
- Hadoop Daemons
- YARN Architecture
- Resource Manager
- Application Master
- Node Manager
Introduction to HDFS (Hadoop Distributed File System)
- Rack Awareness
- HDFS Daemons
- Writing Files to HDFS
- Blocks & Splits
- Input Splits
- Data Replication
- Reading Files from HDFS
- Introduction to HDFS Configuration Files
Working with HDFS
- HDFS Commands
- Accessing HDFS
- CLI Approach
- JAVA Approach [Introducing HDFS JAVA API]
Introduction to Map Reduce Paradigm
- What is Map Reduce?
- Detailed Map Reduce Flow
- Introduction to Key/Value Approach
- Detailed Mapper Functionality
- Detailed Reducer Functionality
- Details of Partitioner
- Shuffle & Sort Process
- Understanding Map Reduce Flow with Word Count Example
Map Reduce Programming
- Introduction to Map Reduce API [New Map Reduce API]
- Map Reduce Data Types
- File Formats
- Input Formats – Input Splits & Records, text input, binary input
- Output Formats – Text Output, Binary Output
- Configuring Development Environment – Eclipse
- Developing a Map Reduce Application using Default Functionality
- Identity Mapper
- Identity Reducer
- ToolRunner API Introduction
- Developing Word Count Application
- Writing Mapper, Reducer & Driver Code
- Building Application
- Deploying Application
- Running the Map Reduce Application
- Local Mode of Execution
- Cluster Mode of Execution
- Monitoring Map Reduce Application
- Map Reduce Combiner
- Map Reduce Counters
- Map Reduce Partitioner
- File Merge Utility
Programming with HIVE
- Introduction to HIVE
- Hive Architecture
- Types of Meta store
- Introduction to Hive Configuration Files
- Hive Data Types
- Simple Data Types
- Collection Data Types
- Types of Hive Tables
- Managed Table
- External Table
- Hive Query Language (HQL or HIVE QL)
- Creating Databases
- Creating Tables
- Joins in Hive
- Group BY and Distinct operations
- Partitioning
Static Partitioning
Dynamic Partitioning
- Bucketing
- Lateral View & Explode [Introduction to Hive UDFs UDF, UDAF & UDTF]
- XML Processing in HIVE
- JSON processing in HIVE
- URL Processing in HIVE
- Hive File Formats [Introduction to Hive SERDE]
- Parquet
- ORC
- AVRO
- Introduction to HIVE Query Optimizations
- Developing Hive UDFs in JAVA
Programming with PIG
- Introduction to PIG
- PIG Architecture
- Introduction to PIG Configuration Files
- PIG vs. HIVE vs. Map Reduce
- Introduction to Data Flow Language
- Pig Data Types
- Pig Programming Modes
- Pig Access Modes
- Detailed PIG Latin Programming
- PIG UDFs & UDF Development in JAVA
- Hive - PIG Integration Introduction to HCATALOG
- Introduction to PIG Optimization
NoSQL & HBASE
- Introduction to NoSQL Databases
- Types of NoSQL Databases
- Introduction To HBASE
- HBASE Architecture
- HBASE Shell Interface
- Creating Data Bases and Tables
- Inserting Data in tables
- Accessing data from Tables
- HBase Filters
- Hive & HBASE Integration
- PIG & HBASE Integration
Introduction to Streaming & FLUME
- Introduction to Streaming
- Introduction to FLUME
- FLUME Architecture
- Flume Agent Setup
- Types of Source, Channel & Sinks
- Developing Sample Flume Applications
- Introduction to KAFKA
SQOOP
- Introduction to SQOOP
- Connecting to RDBMS Using SQOOP
- SQOOP Import
- Import to HDFS o Import to HIVE o Import to HBASE o Bulk Import
- Full Table
- Subset of a Tables
- All tables in DB
- Incremental Import
- SQOOP Export
- Export from HDFS
- Export from Hive
SPARK & SCALA
- Scala Programming Basics
- Apache Spark Basics
- Using Spark Shell
- Spark RDD
- RDD Overview
- RDD Data Sources
- Creating and Saving RDDs
- RDD Operations
- Pair RDD and Pair RDD Operations
- Concept of Persistence
- Spark Data Frames
- What is Data Frame?
- Creating Data Frame from Data Sources (including converting RDD to Data Frames)
- Data Frame Operations
Using Column Expressions
Grouping and Aggregation
Joining Data Frames
Concept of Persistence
- Spark SQL
- Querying table in Spark using Spark SQL
- Querying Files and Views
- Spark Streaming
- Integrating Spark Streaming with Flume & Kafka