Have a personal or library account? Click to login
Apache Flume: Distributed Log Collection for Hadoop - Second Edition Cover

Apache Flume: Distributed Log Collection for Hadoop - Second Edition

Design and implement a series of Flume agents to send streamed data into Hadoop

Paid access
|Sep 2025
Product purchase options

Key Features

    Book Description

    If you are a Hadoop programmer who wants to learn about Flume to be able to move datasets into Hadoop in a timely and replicable manner, then this book is ideal for you. No prior knowledge about Apache Flume is necessary, but a basic knowledge of Hadoop and the Hadoop File System (HDFS) is assumed.

    What you will learn

    • Understand the Flume architecture, and also how to download and install open source Flume from Apache
    • Follow along a detailed example of transporting weblogs in Near Real Time (NRT) to Kibana/Elasticsearch and archival in HDFS
    • Learn tips and tricks for transporting logs and data in your production environment
    • Understand and configure the Hadoop File System (HDFS) Sink
    • Use a morphlinebacked Sink to feed data into Solr
    • Create redundant data flows using sink groups
    • Configure and use various sources to ingest data
    • Inspect data records and move them between multiple destinations based on payload content
    • Transform data enroute to Hadoop and monitor your data flows

    Who this book is for

    Table of Contents

    1. Overview and Architecture
    2. Flume Quick Start
    3. Channels
    4. Sinks and Sink Processors
    5. Sources and Channel Selectors
    6. Interceptors ETL and Routing
    7. Examples
    8. Monitoring Flume
    9. 3 D Registration and Visualization
    PDF ISBN: 978-1-78439-914-6
    Publisher: Packt Publishing Limited
    Copyright owner: © 2015 Packt Publishing Limited
    Publication date: 2025
    Language: English
    Pages: 178