Practical Data Analysis - Second Edition

معرفی کتاب «Practical Data Analysis - Second Edition» نوشتهٔ Hector Cuesta; Dr. Sampath Kumar، منتشرشده توسط نشر Packt Publishing - ebooks Account در سال 2016. این کتاب در 5 صفحه، فرمت pdf، زبان انگلیسی ارائه شده است. «Practical Data Analysis - Second Edition» در دستهٔ بدون دسته‌بندی قرار دارد.

Key Features* Learn to use various data analysis tools and algorithms to classify, cluster, visualize, simulate, and forecast your data * Apply Machine Learning algorithms to different kinds of data such as social networks, time series, and images * A hands-on guide to understanding the nature of data and how to turn it into insight Book DescriptionBeyond buzzwords like Big Data or Data Science, there are a great opportunities to innovate in many businesses using data analysis to get data-driven products. Data analysis involves asking many questions about data in order to discover insights and generate value for a product or a service. This book explains the basic data algorithms without the theoretical jargon, and you'll get hands-on turning data into insights using machine learning techniques. We will perform data-driven innovation processing for several types of data such as text, Images, social network graphs, documents, and time series, showing you how to implement large data processing with MongoDB and Apache Spark. What you will learn* Acquire, format, and visualize your data * Build an image-similarity search engine * Generate meaningful visualizations anyone can understand * Get started with analyzing social network graphs * Find out how to implement sentiment text analysis * Install data analysis tools such as Pandas, MongoDB, and Apache Spark * Get to grips with Apache Spark * Implement machine learning algorithms such as classification or forecasting About the Author**Hector Cuesta** is founder and Chief Data Scientist at Dataxios, a machine intelligence research company. Holds a BA in Informatics and a M.Sc. in Computer Science. He provides consulting services for data-driven product design with experience in a variety of industries including financial services, retail, fintech, e-learning and Human Resources. He is an enthusiast of Robotics in his spare time. **Dr. Sampath Kumar** works as an assistant professor and head of Department of Applied Statistics at Telangana University. He has completed M.Sc., M.Phl., and Ph. D. in statistics. He has five years of teaching experience for PG course. He has more than four years of experience in the corporate sector. His expertise is in statistical data analysis using SPSS, SAS, R, Minitab, MATLAB, and so on. He is an advanced programmer in SAS and matlab software. He has teaching experience in different, applied and pure statistics subjects such as forecasting models, applied regression analysis, multivariate data analysis, operations research, and so on for M.Sc. students. He is currently supervising Ph.D. scholars. Table of Contents1. Getting Started 2. Preprocessing Data 3. Getting to Grips with Visualization 4. Text Classification 5. Similarity-Based Image Retrieval 6. Simulation of Stock Prices 7. Predicting Gold Prices 8. Working with Support Vector Machines 9. Modeling Infectious Diseases with Cellular Automata 10. Working with Social Graphs 11. Working with Twitter Data 12. Data Processing and Aggregation with MongoDB 13. Working with MapReduce 14. Online Data Analysis with Jupyter and Wakari 15. Understanding Data Processing using Apache Spark Cover 1 Copyright 3 Credits 4 About the Authors 5 About the Reviewers 6 www.PacktPub.com 7 Table of Contents 8 Preface 15 Chapter 1: Getting Started 21 [Computer science] 21 Computer science 21 Artificial intelligence 21 Machine learning 22 Statistics 22 Mathematics 23 Knowledge domain 23 Data, information, and knowledge 23 Inter-relationship between data, information, and knowledge 24 The nature of data 26 The data analysis process 27 The problem 28 Data preparation 28 Data exploration 29 Predictive modeling 29 Visualization of results 30 Quantitative versus qualitative data analysis 31 Importance of data visualization 32 What about big data? 33 Quantified self 35 Sensors and cameras 36 Social network analysis 37 Tools and toys for this book 38 Why Python? 38 Why mlpy? 39 Why D3.js? 39 Why MongoDB? 40 Summary 40 Chapter 2: Preprocessing Data 41 Data sources 41 Open data 43 Text files 43 Excel files 44 SQL databases 44 NoSQL databases 46 Multimedia 46 Web scraping 47 Data scrubbing 49 Statistical methods 50 Text parsing 51 Data transformation 52 Data formats 53 Parsing a CSV file with the CSV module 54 Parsing CSV file using NumPy 54 JSON 55 Parsing JSON file using the JSON module 55 XML 56 Parsing XML in Python using the XML module 57 YAML 58 Data reduction methods 59 Filtering and sampling 59 Binned algorithm 59 Dimensionality reduction 60 Getting started with OpenRefine 61 Text facet 63 Clustering 64 Text filters 65 Numeric facets 65 Transforming data 65 Exporting data 67 Operation history 67 Summary 68 Chapter 3: Getting to Grips with Visualization 69 What is visualization? 70 Working with web-based visualization 70 Exploring scientific visualization 71 Visualization in art 72 The visualization life cycle 72 Visualizing different types of data 73 HTML 74 DOM 75 CSS 75 JavaScript 75 SVG 75 Getting started with D3.js 75 Bar chart 76 Pie chart 82 Scatter plots 85 Single line chart 88 Multiple line chart 91 Interaction and animation 95 Data from social networks 98 An overview of visual analytics 99 Summary 99 Chapter 4: Text Classification 101 Learning and classification 101 Bayesian classification 103 NaÃ ̄ve Bayes 103 E-mail subject line tester 103 The data 105 The algorithm 107 Classifier accuracy 111 Summary 113 Chapter 5: Similarity-Based Image Retrieval 114 Image similarity search 114 Dynamic time warping 116 Processing the image dataset 118 Implementing DTW 119 Analyzing the results 121 Summary 124 Chapter 6: Simulation of Stock Prices 125 Financial time series 125 Random Walk simulation 126 Monte Carlo methods 128 Generating random numbers 128 Implementation in D3js 129 Quantitative analyst 137 Summary 138 Chapter 7: Predicting Gold Prices 139 Working with time series data 140 Components of a time series 141 Smoothing time series 143 Lineal regression 146 The data – historical gold prices 148 Nonlinear regressions 149 Kernel Ridge Regressions 149 Smoothing the gold prices time series 152 Predicting in the smoothed time series 153 Contrasting the predicted value 154 Summary 156 Chapter 8: Working with Support Vector Machines 157 Understanding the multivariate dataset 158 Dimensionality reduction 161 Linear Discriminant Analysis (LDA) 162 Principal Component Analysis (PCA) 163 Getting started with SVM 165 Kernel functions 166 The double spiral problem 167 SVM implemented on mlpy 168 Summary 171 Chapter 9: Modeling Infectious Diseases with Cellular Automata 172 Introduction to epidemiology 173 The epidemiology triangle 174 The epidemic models 175 The SIR model 175 Solving the ordinary differential equation for the SIR model with SciPy 176 The SIRS model 178 Modeling with Cellular Automaton 179 Cell, state, grid, neighborhood 180 Global stochastic contact model 181 Simulation of the SIRS model in CA with D3.js 182 Summary 191 Chapter 10: Working with Social Graphs 192 Structure of a graph 192 Undirected graph 193 Directed graph 193 Social networks analysis 194 Acquiring the Facebook graph 194 Working with graphs using Gephi 196 Statistical analysis 202 Male to female ratio 203 Degree distribution 205 Histogram of a graph 206 Centrality 207 Transforming GDF to JSON 209 Graph visualization with D3.js 211 Summary 216 Chapter 11: Working with Twitter Data 217 The anatomy of Twitter data 218 Tweet 218 Followers 218 Trending topics 219 Using OAuth to access Twitter API 219 Getting started with Twython 222 Simple search using Twython 223 Working with timelines 226 Working with followers 228 Working with places and trends 231 Working with user data 233 Streaming API 233 Summary 235 Chapter 12: Data Processing and Aggregation with MongoDB 236 Getting started with MongoDB 237 Database 238 Collection 240 Document 240 Mongo shell 241 Insert/Update/Delete 241 Queries 242 Data preparation 244 Data transformation with OpenRefine 245 Inserting documents with PyMongo 247 Group 250 Aggregation framework 251 Pipelines 252 Expressions 253 Summary 255 Chapter 13: Working with MapReduce 256 An overview of MapReduce 257 Programming model 258 Using MapReduce with MongoDB 258 Map function 259 Reduce function 260 Using mongo shell 260 Using Jupyter 262 Using PyMongo 264 Filtering the input collection 266 Grouping and aggregation 267 Counting the most common words in tweets 270 Summary 273 Chapter 14: Online Data Analysis with Jupyter and Wakari 274 Getting started with Wakari 274 Creating an account in Wakari 275 Getting started with IPython notebook 278 Data visualization 281 Introduction to image processing with PIL 282 Opening an image 283 Working with an image histogram 283 Filtering 285 Operations 287 Transformations 289 Getting started with pandas 290 Working with Time Series 290 Working with multivariate datasets with DataFrame 294 Grouping, Aggregation, and Correlation 298 Sharing your Notebook 301 The data 301 Summary 304 Chapter 15: Understanding Data Processing using Apache Spark 305 Platform for data processing 306 The Cloudera platform 307 Installing Cloudera VM 308 An introduction to the distributed file system 310 First steps with Hadoop Distributed File System – HDFS 311 File management with HUE – web interface 312 An introduction to Apache Spark 313 The Spark ecosystem 314 The Spark programming model 315 An introductory working example of Apache Startup 318 Summary 319 Index 320 Key Features Learn to use various data analysis tools and algorithms to classify, cluster, visualize, simulate, and forecast your data Apply Machine Learning algorithms to different kinds of data such as social networks, time series, and images A hands-on guide to understanding the nature of data and how to turn it into insight Book Description Beyond buzzwords like Big Data or Data Science, there are a great opportunities to innovate in many businesses using data analysis to get data-driven products. Data analysis involves asking many questions about data in order to discover insights and generate value for a product or a service. This book explains the basic data algorithms without the theoretical jargon, and you'll get hands-on turning data into insights using machine learning techniques. We will perform data-driven innovation processing for several types of data such as text, Images, social network graphs, documents, and time series, showing you how to implement large data processing with MongoDB and Apache Spark. What you will learn Acquire, format, and visualize your data Build an image-similarity search engine Generate meaningful visualizations anyone can understand Get started with analyzing social network graphs Find out how to implement sentiment text analysis Install data analysis tools such as Pandas, MongoDB, and Apache Spark Get to grips with Apache Spark Implement machine learning algorithms such as classification or forecasting About the Author Hector Cuesta is founder and Chief Data Scientist at Dataxios, a machine intelligence research company. Holds a BA in Informatics and a M.Sc. in Computer Science. He provides consulting services for data-driven product design with experience in a variety of industries including financial services, retail, fintech, e-learning and Human Resources. He is an enthusiast of Robotics in his spare time. Dr. Sampath Kumar works as an assistant professor and head of Department of Applied Statistics at Telangana University. He has completed M.Sc., M.Phl., and Ph. D. in statistics. He has five years of teaching experience for PG course. He has more than four years of experience in the corporate sector. His expertise is in statistical data analysis using SPSS, SAS, R, Minitab, MATLAB, and so on. He is an advanced programmer in SAS and matlab software. He has teaching experience in different, applied and pure statistics subjects such as forecasting models, applied regression analysis, multivariate data analysis, operations research, and so on for M.Sc. students. He is currently supervising Ph.D. scholars. Table of Contents Getting Started Preprocessing Data Getting to Grips with Visualization Text Classification Similarity-Based Image Retrieval Simulation of Stock Prices Predicting Gold Prices Working with Support Vector Machines Modeling Infectious Diseases with Cellular Automata Working with Social Graphs Working with Twitter Data Data Processing and Aggregation with MongoDB Working with MapReduce Online Data Analysis with Jupyter and Wakari Understanding Data Processing using Apache Spark A practical guide to obtaining, transforming, exploring, and analyzing data using Python, MongoDB, and Apache SparkAbout This Book Learn to use various data analysis tools and algorithms to classify, cluster, visualize, simulate, and forecast your data Apply Machine Learning algorithms to different kinds of data such as social networks, time series, and images A hands-on guide to understanding the nature of data and how to turn it into insightWho This Book Is For This book is for developers who want to implement data analysis and data-driven algorithms in a practical way. It is also suitable for those without a background in data analysis or data processing. Basic knowledge of Python programming, statistics, and linear algebra is assumed. What You Will Learn Acquire, format, and visualize your data Build an image-similarity search engine Generate meaningful visualizations anyone can understand Get started with analyzing social network graphs Find out how to implement sentiment text analysis Install data analysis tools such as Pandas, MongoDB, and Apache Spark Get to grips with Apache Spark Implement machine learning algorithms such as classification or forecastingIn Detail Beyond buzzwords like Big Data or Data Science, there are a great opportunities to innovate in many businesses using data analysis to get data-driven products. Data analysis involves asking many questions about data in order to discover insights and generate value for a product or a service. This book explains the basic data algorithms without the theoretical jargon, and you'll get hands-on turning data into insights using machine learning techniques. We will perform data-driven innovation processing for several types of data such as text, Images, social network graphs, documents, and time series, showing you how to implement large data processing with MongoDB and Apache Spark. Style and approach This is a hands-on guide to data analysis and data processing. The concrete examples are explained with simple code and accessible data About This BookLearn to use various data analysis tools and algorithms to classify, cluster, visualize, simulate, and forecast your dataApply Machine Learning algorithms to different kinds of data such as social networks, time series, and imagesA hands-on guide to understanding the nature of data and how to turn it into insightWho This Book Is ForThis book is for developers who want to implement data analysis and data-driven algorithms in a practical way. It is also suitable for those without a background in data analysis or data processing. Basic knowledge of Python programming, statistics, and linear algebra is assumed.What You Will LearnAcquire, format, and visualize your dataBuild an image-similarity search engineGenerate meaningful visualizations anyone can understandGet started with analyzing social network graphsFind out how to implement sentiment text analysisInstall data analysis tools such as Pandas, MongoDB, and Apache SparkGet to grips with Apache SparkImplement machine learning algorithms such as classification and forecastingIn DetailBeyond buzzwords such as big data or data science, there are a great opportunities to innovate in many businesses using data analysis to get data-driven products. Data analysis involves asking many questions about data in order to discover insights and generate value for a product or a service.This book explains basic data algorithms without the theoretical jargon, and you'll get hands-on experience of turning data into insights using machine learning techniques. We will perform data-driven innovation processing for several types of data, such as text, images, social network graphs, documents, and time series, showing you how to process large amounts of data with MongoDB and Apache Spark.

دانلود کتاب Practical Data Analysis - Second Edition