وبلاگ بلیان

HDInsight Essentials - Second Edition

معرفی کتاب «HDInsight Essentials - Second Edition» نوشتهٔ Rajesh Nadipalli، منتشرشده توسط نشر Packt Publishing Limited در سال 2015. این کتاب در 5 صفحه، فرمت pdf، زبان انگلیسی ارائه شده است. «HDInsight Essentials - Second Edition» در دستهٔ بدون دسته‌بندی قرار دارد.

Learn how to build and deploy a modern big data architecture to empower your business About This Book Learn how to quickly provision a Hadoop cluster using Windows Azure Cloud Services Build an end-to-end application for a big data problem using open source software Discover more about modern data architecture with this guide, to help you understand the transition from legacy relational Enterprise Data Warehouse Who This Book Is For If you want to discover one of the latest tools designed to produce stunning Big Data insights, this book features everything you need to get to grips with your data. Whether you are a data architect, developer, or a business strategist, HDInsight adds value in everything from development, administration, and reporting. What You Will Learn Explore core features of Hadoop, including the HDFS2 and YARN, the new resource manager for Hadoop Build your HDInsight cluster in minutes and learn how to administer it using Azure PowerShell Discover what's new in Hadoop 2.X and the reference architecture for a modern data lake based on Hadoop Find out more about a data lake vision and its core capabilities Ingest and organize your data into HDInsight Utilize open source software to transform data including Hive, Pig, and MapReduce, and make it available for decision makers Get to grips with architectural considerations for scalability, maintainability, and security In Detail Traditional relational databases are today ineffective with dealing with the challenges presented by Big Data. A Hadoop-based architecture offers a radical solution, as it is designed specifically to handle huge sets of unstructured data. This book takes you through the journey of building a modern data lake architecture using HDInsight, a Hadoop-based service that allows you to successfully manage high volume and velocity data in the Microsoft Azure Cloud. Featuring a wealth of practical examples, you'll find tips and techniques to provision your own HDInsight cluster to ingest, organize, transform, and analyze data. While guided through HDInsight, you'll explore the wider Hadoop ecosystem with plenty of working examples on Hadoop technologies including Hive, Pig, MapReduce, HBase, Storm, and analytics solutions including using Excel PowerQuery, PowerMap, and PowerBI. Cover Copyright Credits About the Author About the Reviewers www.PacktPub.com Table of Contents Preface Chapter 1: Hadoop and HDInsight in a Heartbeat Data is everywhere Business value of big data Hadoop concepts Brief history of Hadoop Core components Hadoop cluster layout HDFS overview Writing a file to HDFS Reading a file from HDFS HDFS basic commands YARN overview YARN application life cycle YARN workloads Hadoop distributions HDInsight overview HDInsight and Hadoop relationship Hadoop on Windows deployment options Microsoft Azure HDInsight Service HDInsight Emulator Hortonworks Data Platform (HDP) for Windows Summary Chapter 2: Enterprise Data Lake using HDInsight Enterprise Data Warehouse architecture Source systems Data warehouse Storage Processing User access Provisioning and monitoring Data governance and security Pain points of EDW The next generation Hadoop-based Enterprise data architecture Source systems Data Lake Storage Processing User access Provisioning and monitoring Data governance, security, and metadata Journey to your Data Lake dream Ingestion and organization Transformation (rules driven) Access, analyze, and report Tools and technology for Hadoop ecosystem Use case powered by Microsoft HDInsight Problem statement Solution Source systems Storage Processing User access Benefits Summary Chapter 3: HDInsight Service on Azure Registering for an Azure account Azure storage Provisioning an HDInsight cluster Cluster topology Provisioning using Azure Powershell Creating a storage container Provisioning a new HDInsight cluster HDInsight management dashboard Dashboard Monitor Configuration Exploring clusters using the remote desktop Running a sample MapReduce Deleting the cluster HDInsight Emulator for the development Installing HDInsight Emulator Installation verification Using HDInsight Emulator Summary Chapter 4: Administering Your HDInsight Cluster Monitoring cluster health Name Node status The Name Node Overview page Datanode Status Utilities and logs Hadoop Service Availability YARN Application Status Azure storage management Configuring your storage account Monitoring your storage account Managing access keys Deleting your storage account Azure Powershell Access Azure Blob storage using Azure Powershell Summary Chapter 5: Ingest and Organize Data Lake End-to-end Data Lake solution Ingesting to Data Lake using HDFS command Connecting to a Hadoop client Getting your files on the local storage Transferring to HDFS Loading data to Azure Blob storage using Azure PowerShell Loading files to Data Lake using GUI tools Storage access keys Storage tools CloudXplorer Key benefits Registering your storage account Uploading files to your Blob storage Using Sqoop to move data from RDBMS to Data Lake Key benefits Two modes of using Sqoop Using Sqoop to import data (SQL to Hadoop) Organizing your Data Lake in HDFS Managing file metadata using HCatalog Key benefits Using HCatalog Command Line to create tables Summary Chapter 6: Transform Data in the Data Lake Transformation overview Tools for transforming data in Data Lake HCatalog Persisting HCatalog metastore in a SQL database Apache Hive Hive architecture Starting Hive in HDInsight Basic Hive commands Apache Pig Pig architecture Starting Pig in HDInsight node Basic Pig commands Pig or Hive MapReduce The mapper code The reducer code The driver code Executing MapReduce on HDInsight Azure Powershell for execution of Hadoop jobs Transformation for the OTP project Cleaning data using Pig Executing Pig script Registering a refined and aggregate table using Hive Executing Hive script Reviewing results Other tools used for transformation Oozie Spark Summary Chapter 7: Analyze and Report from Data Lake Data access overview Analysis using Excel and Microsoft Hive ODBC driver Prerequisites Step 1 – installing the Microsoft Hive ODBC driver Step 2 – creating Hive ODBC Data Source Step 3 – importing data to Excel Analysis using Excel Power Query Prerequisites Step 1 – installing the Microsoft Power Query for Excel Step 2 – importing Azure Blob storage data into Excel Step 3 – analyzing data using Excel Other BI features in Excel PowerPivot Power View and Power Map Step 1 – importing Azure Blob storage data into Excel Step 2 – launch map view Step 3 – configure the map Power BI Catalog Ad hoc analysis using Hive Other alternatives for analysis RHadoop Apache Giraph Apache Mahout Azure Machine Learning Summary Chapter 8: HDInsight 3.1 New Features HBase HBase positioning in Data Lake and use cases Provisioning HDInsight HBase cluster Creating a sample HBase schema Designing the airline on-time performance table Connecting to HBase using the HBase shell Creating an HBase table Loading data to the HBase table Querying data from the HBase table HBase additional information Storm Storm positioning in Data Lake Storm key concepts Provisioning HDInsight Storm cluster Running a sample Storm topology Connecting to Storm using Storm shell Running the Storm Wordcount topology Monitoring status of the Wordcount topology Additional information on Storm Apache Tez Summary Chapter 9: Strategy for a Successful Data Lake Implementation Challenges on building a production Data Lake The success path for a production Data Lake Identifying the big data problem Proof of technology for Data Lake Form a Data Lake Center of Excellence Executive sponsors Data Lake consumers Development Operations and infrastructure Architectural considerations Extensible and modular Metadata-driven solution Integration strategy Security Online resources Summary Index

About This Book

  • Optimize the use of multiple sensors to build a robot that navigates and interacts with its environment
  • Work with both the Home Edition and the Educational Edition of the LEGO EV3 Mindstorms kit
  • A practical guide with step-by-step building instructions to help you create your very own robot

Who This Book Is For

This book is for the hobbyists, builders, and programmers who want to build and control their very own robots beyond the capabilities provided with the LEGO EV3 kit. You will need the LEGO MINDSTORMS EV3 kit for this book. The book is compatible with both the Home Edition and the Educational Edition of the kit. You should already have a rudimentary knowledge of general programming concepts and will need to have gone through the basic introductory material provided by the official LEGO EV3 tutorials.

Key FeaturesBook DescriptionWhat you will learnWho this book is forIf you want to discover one of the latest tools designed to produce stunning Big Data insights, this book features everything you need to get to grips with your data. Whether you are a data architect, developer, or a business strategist, HDInsight adds value in everything from development, administration, and reporting.
دانلود کتاب HDInsight Essentials - Second Edition