Have a personal or library account? Click to login
Mastering Apache Spark 2.x Cover

Mastering Apache Spark 2.x

Advanced techniques in complex Big Data processing, streaming analytics and machine learning

Paid access
|Aug 2017
Product purchase options

Advanced analytics on your Big Data with the latest Apache Spark 2.x

Key Features

  • Master the art of real-time Big Data processing using Apache Spark 2.x
  • Perform machine learning, deep learning and streaming data analytics by extending the most up-to-date functionalities of Apache Spark
  • An advanced guide with a unique combination of tips, instructions and practical examples on using Apache Spark effectively

Book Description

Apache Spark is an in-memory, cluster-based Big Data processing system that provides a wide range of functionalities such as graph processing, machine learning, stream processing, and more. This book will take your knowledge of Apache Spark to the next level by teaching you how to expand Spark’s functionality and build your data flows and machine/deep learning programs on top of the platform.

The book starts with a quick overview of the Apache Spark ecosystem, and introduces you to the new features and capabilities in Apache Spark 2.x. You will then work with the different modules in Apache Spark such as interactive querying with Spark SQL, using DataFrames and DataSets effectively, streaming analytics with Spark Streaming, and performing machine learning and deep learning on Spark using MLlib and external tools such as H20 and Deeplearning4j. The book also contains chapters on efficient graph processing, memory management and using Apache Spark on the cloud.

By the end of this book, you will have all the necessary information to master Apache Spark, and use it efficiently for Big Data processing and analytics.

What you will learn

  • Get to grips with the newly introduced features in Apache Spark 2.x
  • Perform highly optimised unified batch and real-time data processing using
  • SparkSQL and Structured Streaming
  • Evaluate large-scale Graph Processing and Analysis using GraphX and GraphFrames
  • Perform advanced machine learning and deep learning with Spark MLlib, SparkML, SystemML, H2O and DeepLearning4J
  • Learn how specific parameter settings affect overall performance of an
  • Apache Spark cluster
  • Apply Apache Spark in Elastic deployments using Jupyter and Zeppelin Notebooks, Docker, Kubernetes and the IBM Cloud

Who this book is for

If you are an intermediate-level Spark developer looking to master the advanced capabilities and use-cases of Apache Spark 2.x, this book is for you. Big Data professionals who wish to know how to integrate and use the features of Apache Spark to build a strong Big Data pipeline will also find this book to be a useful resource. A fundamental knowledge of Apache Spark and the Scala programming language is assumed.

Table of Contents

  1. A first taste and what
  2. Apache Spark SQL
  3. The Catalyst Optimizer
  4. Project Tungsten
  5. Apache Spark Streaming
  6. Structured Streaming
  7. Apache Spark MLlib
  8. Apache SparkML
  9. Apache SystemML
  10. DeepLearning on Apache Spark with DeepLearning4J, ApacheSystemML,H2O
  11. Apache Spark GraphX
  12. ApacheSpark GraphFrames
  13. ApacheSpark with Jupyter Notebooks on IBM DataScience Experience
  14. ApacheSpark on Kubernetes
https://github.com/packtpublishing/mastering-apache-spark-2x
PDF ISBN: 978-1-78528-522-6
Publisher: Packt Publishing Limited
Copyright owner: © 2017 Packt Publishing Limited
Publication date: 2017
Language: English
Pages: 354