Big Data Hadoop Specialist

Course Structure

Course Objectives

The Big Data Hadoop Certification course has been developed to impart an in-depth knowledge of Big Data processing using Hadoop and Spark. It builds a strong conceptual foundation and provides hands-on training on components of Hadoop ecosystem like HDFS, MapReduce, YARN, HBase, Hive, Pig, Impala, Oozie, Zookeeper, Sqoop, Flume and Spark.

As part of the program you will work on real-life industry-based projects across multiple domains like social media, retail, entertainment and e-commerce using our Learning Management System. You will work on multiple assignments, case studies and practice exercises to hone your skills.

At the end of the course you will be able to work on Big Data technologies and build real world industry solutions

Course duration: 150 hours
Mode: Classroom / Online Instructor Led

Key Features

  • 60 hours of instructor led live training on weekends
  • Hand-on practice on 5 real life industry projects
  • Lifetime access to our Learning Management System
  • Personal attention from our faculties
  • Performance evaluation
  • Placement assistance through our wide industry network
    • 90 hours of self learning
    • Practice exercises and assignments to enhance skills
    • Faculties from IIT/IIMs with rich industry experience
    • Full access to video lectures for self paced learning
    • 100% moneyback guarantee
    • Virtual lab to work on projects/ assignments/ exercises
    Introduction to Big data
    Realizing the amount of data out there and their key sources.
    How do you store and analyse this large datasets? A comparison of traditional vs big data ecosystem. A quick look at other computing environments.
    Case Study - Interesting applications of Big Data
    Hours - 1.5
    Introduction to Hadoop
    What is Apache Hadoop? What are the components of Hadoop ecosystem?
    Hadoop architecture. Reading and writing data in Hadoop, Hadoop data processing,
    What is the meaning of different Hadoop releases? How do you choose which Hadoop release to use? Compatibility issues among different releases. Hadoop 2.x components
    What are Hadoop distributions?
    Limitations of Hadoop
    Case study - None
    Hours - 1.5
    Setting up & installing Hadoop
    Installing Java
    Installing Hadoop on Ubuntu
    Installing Hadoop on Windows using Virtual Box
    Installing Hadoop using Virtual Machine Image
    Setting up Hadoop in standalone mode (to work on your local machine)
    Setting up Hadoop in Pseudodistributed mode (to simulate clusters on your local machine)
    Setting up Hadoop in Fully distributed mode on cloud. Understanding key concepts like - cluster specification, network topology & rack awareness.
    Cluster configuration & cluster sizing
    Case Study - Building your first Hadoop Application
    Hours - 2
    Understanding the distributed file system : HDFS
    HDFS design. HDFS components - Blocks, Namenodes & Datanodes, HDFS Federation, HDFS High Availability
    Reading & writing data in HDFS
    Performing basic file system operations using command prompt
    All about file system commands
    Data compression
    Case Study 1 - HDFS File Slurper to copy unstructured and binary files in to HDFS
    Case Study 2 - HDFS File Slurper to automate file copying from HDFS
    Case Study 3 - Picking the right compression codec for your data
    Hours - 2
    Understanding the framework for processing data using MapReduce & YARN
    Understanding the MapReduce concept using a case study. Highlights the difference between serial processing & parallel processing
    Understanding all the phases of MapReduce application flow - Map phase, Shuffle phase, Reduce phase.
    Understanding the fundamental classes. Writing a simple MapReduce Application.
    Understanding YARN architecture - resource manager, node manager & application master.
    Too much emphasis is not given on Java programming. We will learn to build such applications using other programming languages like Pig or Hive, which hides the complexities of MapReduce.
    Case Study - Writing MapReduce Applications in Java.
    Hours - 2
    Data Serialization
    Understanding input & output in MapReduce
    Processing common serialization formats - XML & JSON
    Processing big data serialization formats - SequenceFiles, Protocol Buffers, Thrift, and Avro
    Case Study 1- Working with MapReduce & XML
    Case Study 2- Working with MapReduce & JSON
    Case Study 3- Working with SequenceFiles
    Case Study 4 - Integrating Protocol Buffers with MapReduce
    Case Study 5 - Working with Thrift
    Case Study 6 - Avro with MapReduce
    Case Study 7 - Writing input and output formats for CSV
    Hours - 3
    Installing and running Pig
    Sample Pig code to illustrate the data processing flow
    Syntax & semantics of Pig - Structure, statements, expressions, data types, schemas, functions & macros
    User defined functions - Filter UDF, Eval UDF, Load UDF
    Data processing operators - Loading & storing data, filtering data, grouping & joining data, sorting, combining & splitting data
    Local and Distributed Modes of Running Pig scripts
    Practical considerations - Parallelism, Parameter Substitution
    Case Study 1- Compression with Pig
    Case Study 2- Splittable LZOP with Pig
    Case Study 3 - Using Pig to find malicious actors in log data
    Case Study 4 - Optimizing user workflows with Pig
    Case Study 5 - Optimize Pig by reviewing some data and network performance patterns
    Labs will be conducted to explain the programming aspects
    Hours - 7
    Data storage with HBase
    Introduction to HBase
    Understanding HBase data model
    Understanding the HBase architecture
    Install HBase
    Introduction to data modeling
    Creating a table, filling it with data & retrieving information from the table
    Tuning HBase
    Note: We will not be using Java to interact with HBase database. Hive will be used instead.
    Case Study 1 - Getting HBase data into HDFS
    Case Study 2 - MapReduce with HBase as a data source
    Case Study 3 - HDFS egress to HBase
    Case Study 4 - Using HBase as a data sink in MapReduce
    Labs will be conducted to explain the programming aspects
    Hours - 5
    Getting started with Hive - installation & configuration, Using the Hive Client.
    An example of using Hive
    Hive architecture & configuration. Metastore.
    Comparison with traditional databases
    Syntax & semantics of Hive Query Language - Structure, statements, expressions, data types, operators & functions
    Table manipulation with Hive - Partition & Buckets, Data importing, Table altering, Table dropping
    Data querying - sorting & aggregating, Joins (inner join, outer join, semi join, map join), Subqueries, Views
    User defined function - writing your own UDF, writing an aggregate function, writing complex aggregate functions
    Storage formats - Delimited text, Avro - Avro datatypes & schemas, In memory serialization & deserialization, Avro data files
    Case Study 1 - Loading log files using Hive
    Case Study 2 - Writing UDFs and compressed partitioned tables
    Case Study 3 - Tuning Hive joins
    Labs will be conducted to explain the programming aspects
    Hours - 7
    Introduction to Impala. Pig vs Hive vs Impala
    Installing and using Impala
    Syntax & semantics of Impala
    Data management with Impala
    Data modeling with Impala
    Data partitioning
    Case Study 1 - Use Impala to access Airport data to create real time visuallization on data and realize the cause of flight delays
    Labs will be conducted to explain the programming aspects
    Hours - 7
    Ingesting data using Scoop
    Installing Scoop.
    Scoop connectors
    Importing data. A sample import from text file. A sample import from MySQL database. Controlling the import. Imported data and Hive. Importing large objects
    Exporting data. Transacationality. SequenceFiles.
    Case Study 1 - Using Sqoop to import data from MySQL
    Case Study 2 -Using Sqoop to export data to MySQL
    Hours - 4
    Ingesting data from data-generating servers using Flume
    Introduction to Flume. Its major components & configuration. Flume architecture.
    Understanding data flow.
    Fetching Twitter data using Flume.
    Case Study - Pushing system log messages into HDFS with Flume
    Hours - 2
    Statistical analysis in Hadoop
    Introduction to R
    Installing R connector for Hadoop
    Interacting with HDFS from R
    Performing predictive analytics on Hadoop using R
    Case 1 - Calculate the daily mean for stocks (streaming data example)
    Case 2 - Calculate the cumulative moving average for stocks (streaming data example)
    Case 3 - Calculate the cumulative moving average for stocks using Rhipe
    Case 4 - Calculate the cumulative moving average for stocks using RHadoop
    Hours - 7
    Data processing with Spark
    Introduction to Spark, Spark vs Hadoop.
    Installing Spark and setting up the cluster. Running Spark on a Single Machine. Running Spark on cloud. Deploying Spark on cloud.
    Using Spark Shell. Spark Web UI. Loading a simple text file. Building a predictive model using Spark.
    Running a Spark application. Abt. Maven.
    Loading & saving data using Spark. Functions for data manipulation.
    Using Spark with Hive. Using Hive queries in Spark.
    Extract and analyze the data from twitter using Spark streaming.
    Overview of GraphX module in Spark. Creating graphs with GraphX.
    Machine learning with Spark
    Case 1 - Building a recommender system
    Case 2 - Building a Spam Detection algorithm
    Case 3 - Performing K-Means clustering
    Hours - 7
    Scheduling application workflows with Oozie
    Setting up the server. Installing Oozie, Oozie- workflow engine, Example M/R action
    Developing and running an Oozie workflow. Oozie Components.
    Scheduling and coordinating Oozie workflows. Demo on Oozie Workflow
    Oozie Co-ordinator, Oozie Commands, Oozie Web Console
    Hours - 2
    Hadoop’s distributed coordination service ZooKeeper
    Installing and Running ZooKeeper
    An example of ZooKeeper. Group membership. Group creation. Joining a group. Listing members of a group. Deleting a group.
    Understanding the ZooKeeper service. Data model. Operations. Implementation. Consistency. Sessions.
    Zookeeper use cases, Znodes, Znodes operations, Znodes management, Cluster management, Zookeeper Leader election
    Building an application with ZooKeeper. Zookeeper in production.
    Hours - 2
    Distributed database - Cassandra
    Introduction to Cassandra. Architecture & Data Model
    Installing Cassandra
    Working with Cassandra using Shell Commands
    Keyspace operations - create, alter & drop
    Table operations - create, alter, drop, truncate, indexing, batch
    Data operations - create, read, update, delete
    Syntax & semantics - datatypes, collections, user defined datatypes
    Hours - 6
    Ad hoc requests - 4 hours

    Is this course for you?

    You should take this course if you are a:

    • Student (UG/PG) and want to build Big Data analytics skills
    • IT professional looking for a career switch to Big Data analytics
    • Job seeker who wants to start a career in Big Data
    • Big data professional who wants to learn advanced analytics techniques
    • Enthusiast who has genuine interest in Big Data analytics and wants to grow his skills

    What are the pre-requisites of the course?

    There are no pre-requisites for this course. The course starts from scratch which makes it easy to understand for everyone and provides in-depth knowledge.

    At the end of the course you will be entitled to Simplify Analytics Big Data Hadoop Certificate, provided you fulfil the following terms:

    • Completion and submission of at least 4 projects/case studies
    • Attend at least 85% of the sessions
    • Clear the final online test by minimum 60%
    What is the mode of this training course?
    Classroom & Online instructor led. Classroom sessions are held at multiple training centres located in Delhi-NCR region. Live online sessions are conducted through our "Virtual Classroom". This will allow you to attend the course remotely from anywhere through your desktop/laptop/tablet/smartphone. Video recording of each session is provided at the end of live session.
    Do I need to have computer programming background to take the course?
    No, you don’t need to have a programming background to learn analytics. The program has been designed in a way that it starts from scratch and makes it easier to learn for everyone.
    What if I miss a class?
    You can attend the missed session, in any other live batch. You can also use the video recording of the session you missed.
    What kind of placement assistance is offered by Simplify Analytics?
    We are committed to getting you placed. All our courses include - Real life projects + Internship + Certificate + Interview QnA + Resume building & sharing + Job search guidance + Interview call assistance.
    What if I still have doubts after attending a live session?
    You can retake a class as many times as you wish across multiple batches. Also, we conduct separate doubt clearing sessions to help our students. We make sure that you understand all the concepts and are able to build solutions.
    What if I want to cancel my enrollment post registration? Will I get a refund?
    Yes, we have a 100% money back policy which allows you to cancel your enrollment after the first two classes (before third class). If you are not satisfied from the program, all your money will be refunded back to you.
    What are system requirements?
    You will require a laptop or workstation with a minimum 2 GB RAM & i3 processor (or equivalent) to practice & submit assignments. No constraint on OS.
    Thank you for choosing "Big Data Hadoop Specialist" Training Program

    Course reviews
    1. Simplify Analytics - Course reviews
      5.00 out of 5

      Vaibhav Nellore

      Very knowledgeable!! Asks stimulating questions..Jaydeep is too good at explaining ideas; well designed course with great content.

    2. Simplify Analytics - Course reviews
      5.00 out of 5

      Rohit Kumar

      Great teaching techniques help you dwell into the field of analytics. would really recommend to anyone looking for a career in analytics