Skip to main content

Published by Addison-Wesley (October 28, 2015) © 2016

Douglas Eadline
    VitalSource eTextbook (Lifetime access)
    €14,99
    ISBN-13: 9780134049991

    Hadoop 2 Quick-Start Guide: Learn the Essentials of Big Data Computing in the Apache Hadoop 2 Ecosystem ,1st edition

    Access details

    • Instant access once purchased
    • Fulfilled by VitalSource

    Features

    • Add notes and highlights
    • Search by keyword or page

    Language: English

    Product Information

    Get Started Fast with Apache Hadoop® 2, YARN, and Today’s Hadoop Ecosystem

     

    With Hadoop 2.x and YARN, Hadoop moves beyond MapReduce to become practical for virtually any type of data processing. Hadoop 2.x and the Data Lake concept represent a radical shift away from conventional approaches to data usage and storage. Hadoop 2.x installations offer unmatched scalability and breakthrough extensibility that supports new and existing Big Data analytics processing methods and models.

     

    Hadoop® 2 Quick-Start Guide is the first easy, accessible guide to Apache Hadoop 2.x, YARN, and the modern Hadoop ecosystem. Building on his unsurpassed experience teaching Hadoop and Big Data, author Douglas Eadline covers all the basics you need to know to install and use Hadoop 2 on personal computers or servers, and to navigate the powerful technologies that complement it.

     

    Eadline concisely introduces and explains every key Hadoop 2 concept, tool, and service, illustrating each with a simple “beginning-to-end” example and identifying trustworthy, up-to-date resources for learning more.

     

    This guide is ideal if you want to learn about Hadoop 2 without getting mired in technical details. Douglas Eadline will bring you up to speed quickly, whether you’re a user, admin, devops specialist, programmer, architect, analyst, or data scientist.

     

    Coverage Includes

    • Understanding what Hadoop 2 and YARN do, and how they improve on Hadoop 1 with MapReduce
    • Understanding Hadoop-based Data Lakes versus RDBMS Data Warehouses
    • Installing Hadoop 2 and core services on Linux machines, virtualized sandboxes, or clusters
    • Exploring the Hadoop Distributed File System (HDFS)
    • Understanding the essentials of MapReduce and YARN application programming
    • Simplifying programming and data movement with Apache Pig, Hive, Sqoop, Flume, Oozie, and HBase
    • Observing application progress, controlling jobs, and managing workflows
    • Managing Hadoop efficiently with Apache Ambari–including recipes for HDFS to NFSv3 gateway, HDFS snapshots, and YARN configuration
    • Learning basic Hadoop 2 troubleshooting, and installing Apache Hue and Apache Spark

     

    Foreword         xi

    Preface          xiii

    Acknowledgments         xix

    About the Author          xxi

     

    Chapter 1: Background and Concepts         1

    Defining Apache Hadoop  1

    A Brief History of Apache Hadoop  3

    Defining Big Data  4

    Hadoop as a Data Lake  5

    Using Hadoop: Administrator, User, or Both  6

    First There Was MapReduce  7

    Moving Beyond MapReduce with Hadoop V2   13

    The Apache Hadoop Project Ecosystem   15

    Summary and Additional Resources   18

     

    Chapter 2: Installation Recipes         19

    Core Hadoop Services   19

    Planning Your Resources   21

    Installing on a Desktop or Laptop   23

    Installing Hadoop with Ambari   40

    Installing Hadoop in the Cloud Using Apache Whirr   56

    Summary and Additional Resources   62

     

    Chapter 3: Hadoop Distributed File System Basics          63

    Hadoop Distributed File System Design Features   63

    HDFS Components   64

    HDFS User Commands   72

    HDFS Web GUI   77

    Using HDFS in Programs   77

    Summary and Additional Resources   83

     

    Chapter 4: Running Example Programs and Benchmarks          85

    Running MapReduce Examples   85

    Running Basic Hadoop Benchmarks   95

    Summary and Additional Resources   98

     

    Chapter 5: Hadoop MapReduce Framework         101

    The MapReduce Model   101

    MapReduce Parallel Data Flow   104

    Fault Tolerance and Speculative Execution   107

    Summary and Additional Resources   109

     

    Chapter 6: MapReduce Programming          111

    Compiling and Running the Hadoop WordCount Example   111

    Using the Streaming Interface   116

    Using the Pipes Interface   119

    Compiling and Running the Hadoop Grep Chaining Example   121

    Debugging MapReduce   124

    Summary and Additional Resources   128

     

    Chapter 7: Essential Hadoop Tools         131

    Using Apache Pig   131

    Using Apache Hive   134

    Using Apache Sqoop to Acquire Relational Data   139

    Using Apache Flume to Acquire Data Streams   148

    Manage Hadoop Workflows with Apache Oozie   154

    Using Apache HBase   163

    Summary and Additional Resources   169

     

    Chapter 8: Hadoop YARN Applications          171

    YARN Distributed-Shell   171

    Using the YARN Distributed-Shell   172

    Structure of YARN Applications   178

    YARN Application Frameworks   179

    Summary and Additional Resources   184

     

    Chapter 9: Managing Hadoop with Apache Ambari          185

    Quick Tour of Apache Ambari   186

    Managing Hadoop Services   194

    Changing Hadoop Properties   198

    Summary and Additional Resources   204

     

    Chapter 10: Basic Hadoop Administration Procedures           205

    Basic Hadoop YARN Administration   206

    Basic HDFS Administration   208

    Capacity Scheduler Background   220

    Hadoop Version 2 MapReduce Compatibility   222

    Summary and Additional Resources   225

     

    Appendix A: Book Webpage and Code Download          227

     

    Appendix B: Getting Started Flowchart and Troubleshooting Guide         229

    Getting Started Flowchart   229

    General Hadoop Troubleshooting Guide   229

     

    Appendix C: Summary of Apache Hadoop Resources by Topic          243

    General Hadoop Information   243

    Hadoop Installation Recipes   243

    HDFS   244

    Examples   244

    MapReduce   245

    MapReduce Programming   245

    Essential Tools   245

    YARN Application Frameworks   246

    Ambari Administration   246

    Basic Hadoop Administration   247

     

    Appendix D: Installing the Hue Hadoop GUI         249

    Hue Installation   249

    Starting Hue   253

    Hue User Interface   253

     

    Appendix E: Installing Apache Spark         257

    Spark Installation on a Cluster   257

    Starting Spark across the Cluster   258

    Installing and Starting Spark on the Pseudo-distributed Single-Node Installation   260

    Run Spark Examples   260

     

    Index         261

     

    Top