Apache Hadoop is an open-source distributed fault-tolerant system that leverages commodity hardware to achieve large-scale agile data storage and processing. Hadoop empowers applications to work with thousands of nodes and petabytes of data without exposing the complexity of clustering to the end user.

In this seminar we’ll discuss the motivation for Hadoop, the basice components and review five different strategies to developing in the Hadoop echo system.

This seminar is for application developers, team leaders and architects who want to understand Hadoop’s architecture and related projects.


  • Introduction to Hadoop
    • The motivation for Hadoop
    • Hadoop vs traditional data storage and processing
    • HDFS
    • The building blocks of Hadoop
  • Yarn – the operating system of Had
    • Bringing the code to the data
    • Introduction to Yarn
    • Yarn architecture
    • Yarn example
  • Pig
    • Why is Pig Kosher
    • What is it good for?
    • Pig Features
    • A Pig Example
  • HBase
    • Introduction to NoSQL
    • What is HBase good for?
    • HBase storage
    • HBase examples
  • Hive
    • The motivation for Hive
    • Hive strengths and weakness
    • Execution engines for Hive
    • sqoop
    • Hive examples
  • Spark – the distributed execution engine
    • The concept of RDD in Spark
    • Spark modules
    • Spark example
  • Datameer – a different approach to operating Hadoop
    • A spreadsheet approach to BI over Hadoop
    • Datameer features
    • A Datameer example


Main Speaker

Liran Eisenberg
Liran EisenbergAmber IT Consulting