Course Overview

Introduction to Hadoop Development is a five-day, lab intensive Hadoop course for developers. Students will learn how to use Apache Hadoop and write MapReduce programs. Students will begin with a quick overview of installing Hadoop, setting it up in a cluster, and then proceed to writing data analytic programs. The course will present the basic concepts of MapReduce applications developed using Hadoop, including a close look at framework components, use of Hadoop for a variety of data analysis tasks, and numerous examples of Hadoop in action.

The course will further examine related technologies such as Hive, Pig, and Apache Accumulo. Apache Accumulo is a highly scalable structured store based on Google's BigTable, written in Java and operates over the Hadoop Distributed File System (HDFS). Hive is data warehouse software for querying and managing large datasets. Pig is a platform to take advantage of parallelization when running data analysis. Finally, you will observe how Hadoop works in and supports cloud computing and explore examples with Amazon Web Services and case studies.

Key Learning Areas

  • Learn the basics of Hadoop, targeted at the developer experience.
  • Using Hadoop and writing MapReduce programs to allow processing and generating of large data sets in parallel.
  • Understand the larger Hadoop ecosystem by learning about related technologies such as Hive and Pig.
  • Learn how Hadoop works in cloud computing environments with specific examples such as Amazon Web Services (AWS)

Course Outline

  • What is Hadoop?
  • Starting Hadoop
  • Components of Hadoop
  • Writing Basic MapReduce Programs
  • Advanced MapReduce
  • Programming Practices
  • Cookbook
  • Managing Hadoop
  • Running Hadoop in the Cloud: Working with Amazon Web Services (AWS)
  • Programming with Pig
  • Overview of Hadoop Related Technologies

Who Benefits

This class is focused on the Hadoop 2.6 release, but most features can be used on earlier 2.x releases.

This hands-on class is approximately 40/60 lab to lecture ratio, combining engaging lecture, demos, group activities and discussions with comprehensive machine-based practical programming labs and project work.


Attending students should have practical skills in Java, Unix, and Data Persistence with JPA2, or should have attended prior hands-on training in the those topics.