Apache Flume: Distributed Log Collection for Hadoop - Second Edition

Design and implement a series of Flume agents to send streamed data into Hadoop

Publisher:Packt Publishing Limited

By: Steven Hoffman

Paid access

|Sep 2025

E-Book €24.99Institutions €115.95

Key Features

Book Description

If you are a Hadoop programmer who wants to learn about Flume to be able to move datasets into Hadoop in a timely and replicable manner, then this book is ideal for you. No prior knowledge about Apache Flume is necessary, but a basic knowledge of Hadoop and the Hadoop File System (HDFS) is assumed.

What you will learn

Understand the Flume architecture, and also how to download and install open source Flume from Apache
Follow along a detailed example of transporting weblogs in Near Real Time (NRT) to Kibana/Elasticsearch and archival in HDFS
Learn tips and tricks for transporting logs and data in your production environment
Understand and configure the Hadoop File System (HDFS) Sink
Use a morphlinebacked Sink to feed data into Solr
Create redundant data flows using sink groups
Configure and use various sources to ingest data
Inspect data records and move them between multiple destinations based on payload content
Transform data enroute to Hadoop and monitor your data flows

Apache Flume: Distributed Log Collection for Hadoop - Second Edition

Key Features

Book Description

What you will learn

Who this book is for

Table of Contents

People also read

Publications carousel

Apache Flume: Distributed Log Collection for Hadoop

Hadoop Beginner's Guide

Hadoop Essentials

Instant MapReduce Patterns - Hadoop Essentials How-to

Learning Hadoop 2

Apache Hadoop 3 Quick Start Guide

Mastering Hadoop

Hadoop Real-World Solutions Cookbook- Second Edition

Modern Big Data Processing with Hadoop

Hadoop: Data Processing and Modelling

Paradigm

My account