Have a personal or library account? Click to login
Learning Hadoop 2 Cover

Learning Hadoop 2

Design and implement data processing, lifecycle management, and analytic workflows with the cutting-edge toolbox of Hadoop 2

Paid access
|Feb 2015
Product purchase options

Key Features

    Book Description

    If you are a system or application developer interested in learning how to solve practical problems using the Hadoop framework, then this book is ideal for you. You are expected to be familiar with the Unix/Linux command-line interface and have some experience with the Java programming language. Familiarity with Hadoop would be a plus.

    What you will learn

    • Write distributed applications using the MapReduce framework
    • Go beyond MapReduce and process data in real time with Samza and iteratively with Spark
    • Familiarize yourself with data mining approaches that work with very large datasets
    • Prototype applications on a VM and deploy them to a local cluster or to a cloud infrastructure (Amazon Web Services)
    • Conduct batch and real time data analysis using SQLlike tools
    • Build data processing flows using Apache Pig and see how it enables the easy incorporation of custom functionality
    • Define and orchestrate complex workflows and pipelines with Apache Oozie
    • Manage your data lifecycle and changes over time

    Who this book is for

    If you are a system or application developer interested in learning how to solve practical problems using the Hadoop framework, then this book is ideal for you. You are expected to be familiar with the Unix/Linux command-line interface and have some experience with the Java programming language. Familiarity with Hadoop would be a plus.

    Table of Contents

    1. Introduction
    2. Storage
    3. Processing - MapReduce and Beyond
    4. Real Time Computation with Samza
    5. Iterative Computation with Spark
    6. Data Analysis with Apache Pig
    7. Hadoop and SQL
    8. Data Lifecycle Management
    9. Making Development Easier
    10. Running a Hadoop Cluster
    11. Where to go Next
    https://github.com/packtpublishing/learning-hadoop-2
    PDF ISBN: 978-1-78328-552-5
    Publisher: Packt Publishing Limited
    Copyright owner: © 2015 Packt Publishing Limited
    Publication date: 2015
    Language: English
    Pages: 382