Python Data Analysis: Perform data collection, data processing, wrangling, visualization, and model building using Python, 3rd Edition

معرفی کتاب «Python Data Analysis: Perform data collection, data processing, wrangling, visualization, and model building using Python, 3rd Edition» نوشتهٔ Navlani, Avinash; Fandango, Armando; Idris, Ivan، منتشرشده توسط نشر Packt Publishing در سال 2021. این کتاب در 5 صفحه، فرمت pdf، زبان انگلیسی ارائه شده است. «Python Data Analysis: Perform data collection, data processing, wrangling, visualization, and model building using Python, 3rd Edition» در دستهٔ بدون دسته‌بندی قرار دارد.

نویسنده: Navlani, Avinash; Fandango, Armando; Idris, Ivan
ناشر: Packt Publishing
سال انتشار: 2021
تعداد صفحات: 5
فرمت: pdf
زبان: en
شابک: 9781789953459
دسته: بدون دسته‌بندی

Understand data analysis pipelines using machine learning algorithms and techniques with this practical guideKey Features\* Prepare and clean your data to use it for exploratory analysis, data manipulation, and data wrangling\* Discover supervised, unsupervised, probabilistic, and Bayesian machine learning methods\* Get to grips with graph processing and sentiment analysisBook DescriptionData analysis enables you to generate value from small and big data by discovering new patterns and trends, and Python is one of the most popular tools for analyzing a wide variety of data. With this book, you'll get up and running using Python for data analysis by exploring the different phases and methodologies used in data analysis and learning how to use modern libraries from the Python ecosystem to create efficient data pipelines.Starting with the essential statistical and data analysis fundamentals using Python, you'll perform complex data analysis and modeling, data manipulation, data cleaning, and data visualization using easy-to-follow examples. You'll then understand how to conduct time series analysis and signal processing using ARMA models. As you advance, you'll get to grips with smart processing and data analytics using machine learning algorithms such as regression, classification, Principal Component Analysis (PCA), and clustering. In the concluding chapters, you'll work on real-world examples to analyze textual and image data using natural language processing (NLP) and image analytics techniques, respectively. Finally, the book will demonstrate parallel computing using Dask.By the end of this data analysis book, you'll be equipped with the skills you need to prepare data for analysis and create meaningful data visualizations for forecasting values from data.What you will learn\* Explore data science and its various process models\* Perform data manipulation using NumPy and pandas for aggregating, cleaning, and handling missing values\* Create interactive visualizations using Matplotlib, Seaborn, and Bokeh\* Retrieve, process, and store data in a wide range of formats\* Understand data preprocessing and feature engineering using pandas and scikit-learn\* Perform time series analysis and signal processing using sunspot cycle data\* Analyze textual data and image data to perform advanced analysis\* Get up to speed with parallel computing using DaskWho this book is forThis book is for data analysts, business analysts, statisticians, and data scientists looking to learn how to use Python for data analysis. Students and academic faculties will also find this book useful for learning and teaching Python data analysis using a hands-on approach. A basic understanding of math and working knowledge of the Python programming language will help you get started with this book. Cover Title Page Copyright and Credits About Packt Contributors Table of Contents Preface Section 1: Foundation for Data Analysis Chapter 1: Getting Started with Python Libraries Understanding data analysis The standard process of data analysis The KDD process SEMMA CRISP-DM Comparing data analysis and data science The roles of data analysts and data scientists The skillsets of data analysts and data scientists Installing Python 3 Python installation and setup on Windows Python installation and setup on Linux Python installation and setup on Mac OS X with a GUI installer Python installation and setup on Mac OS X with brew Software used in this book Using IPython as a shell Reading manual pages Where to find help and references to Python data analysis libraries Using JupyterLab Using Jupyter Notebooks Advanced features of Jupyter Notebooks Keyboard shortcuts Installing other kernels Running shell commands Extensions for Notebook Summary Chapter 2: NumPy and pandas Technical requirements Understanding NumPy arrays Array features Selecting array elements NumPy array numerical data types dtype objects Data type character codes dtype constructors dtype attributes Manipulating array shapes The stacking of NumPy arrays Partitioning NumPy arrays Changing the data type of NumPy arrays Creating NumPy views and copies Slicing NumPy arrays Boolean and fancy indexing Broadcasting arrays Creating pandas DataFrames Understanding pandas Series Reading and querying the Quandl data Describing pandas DataFrames Grouping and joining pandas DataFrame Working with missing values Creating pivot tables Dealing with dates Summary References Chapter 3: Statistics Technical requirements Understanding attributes and their types Types of attributes Discrete and continuous attributes Measuring central tendency Mean Mode Median Measuring dispersion Skewness and kurtosis Understanding relationships using covariance and correlation coefficients Pearson's correlation coefficient Spearman's rank correlation coefficient Kendall's rank correlation coefficient Central limit theorem Collecting samples Performing parametric tests Performing non-parametric tests Summary Chapter 4: Linear Algebra Technical requirements Fitting to polynomials with NumPy Determinant Finding the rank of a matrix Matrix inverse using NumPy Solving linear equations using NumPy Decomposing a matrix using SVD Eigenvectors and Eigenvalues using NumPy Generating random numbers Binomial distribution Normal distribution Testing normality of data using SciPy Creating a masked array using the numpy.ma subpackage Summary Section 2: Exploratory Data Analysis and Data Cleaning Chapter 5: Data Visualization Technical requirements Visualization using Matplotlib Accessories for charts Scatter plot Line plot Pie plot Bar plot Histogram plot Bubble plot pandas plotting Advanced visualization using the Seaborn package lm plots Bar plots Distribution plots Box plots KDE plots Violin plots Count plots Joint plots Heatmaps Pair plots Interactive visualization with Bokeh Plotting a simple graph Glyphs Layouts Nested layout using row and column layouts Multiple plots Interactions Hide click policy Mute click policy Annotations Hover tool Widgets Tab panel Slider Summary Chapter 6: Retrieving, Processing, and Storing Data Technical requirements Reading and writing CSV files with NumPy Reading and writing CSV files with pandas Reading and writing data from Excel Reading and writing data from JSON Reading and writing data from HDF5 Reading and writing data from HTML tables Reading and writing data from Parquet Reading and writing data from a pickle pandas object Lightweight access with sqllite3 Reading and writing data from MySQL Inserting a whole DataFrame into the database Reading and writing data from MongoDB Reading and writing data from Cassandra Reading and writing data from Redis PonyORM Summary Chapter 7: Cleaning Messy Data Technical requirements Exploring data Filtering data to weed out the noise Column-wise filtration Row-wise filtration Handling missing values Dropping missing values Filling in a missing value Handling outliers Feature encoding techniques One-hot encoding Label encoding Ordinal encoder Feature scaling Methods for feature scaling Feature transformation Feature splitting Summary Chapter 8: Signal Processing and Time Series Technical requirements The statsmodels modules Moving averages Window functions Defining cointegration STL decomposition Autocorrelation Autoregressive models ARMA models Generating periodic signals Fourier analysis Spectral analysis filtering Summary Section 3: Deep Dive into Machine Learning Chapter 9: Supervised Learning - Regression Analysis Technical requirements Linear regression Multiple linear regression Understanding multicollinearity Removing multicollinearity Dummy variables Developing a linear regression model Evaluating regression model performance R-squared MSE MAE RMSE Fitting polynomial regression Regression models for classification Logistic regression Characteristics of the logistic regression model Types of logistic regression algorithms Advantages and disadvantages of logistic regression Implementing logistic regression using scikit-learn Summary Chapter 10: Supervised Learning - Classification Techniques Technical requirements Classification Naive Bayes classification Decision tree classification KNN classification SVM classification Terminology Splitting training and testing sets Holdout K-fold cross-validation Bootstrap method Evaluating the classification model performance Confusion matrix Accuracy Precision Recall F-measure ROC curve and AUC Summary Chapter 11: Unsupervised Learning - PCA and Clustering Technical requirements Unsupervised learning Reducing the dimensionality of data PCA Performing PCA Clustering Finding the number of clusters The elbow method The silhouette method Partitioning data using k-means clustering Hierarchical clustering DBSCAN clustering Spectral clustering Evaluating clustering performance Internal performance evaluation The Davies-Bouldin index The silhouette coefficient External performance evaluation The Rand score The Jaccard score F-Measure or F1-score The Fowlkes-Mallows score Summary Section 4: NLP, Image Analytics, and Parallel Computing Chapter 12: Analyzing Textual Data Technical requirements Installing NLTK and SpaCy Text normalization Tokenization Removing stopwords Stemming and lemmatization POS tagging Recognizing entities Dependency parsing Creating a word cloud Bag of Words TF-IDF Sentiment analysis using text classification Classification using BoW Classification using TF-IDF Text similarity Jaccard similarity Cosine similarity Summary Chapter 13: Analyzing Image Data Technical requirements Installing OpenCV Understanding image data Binary images Grayscale images Color images Color models Drawing on images Writing on images Resizing images Flipping images Changing the brightness Blurring an image Face detection Summary Chapter 14: Parallel Computing Using Dask Parallel computing using Dask Dask data types Dask Arrays Dask DataFrames DataFrame Indexing Filter data Groupby Converting a pandas DataFrame into a Dask DataFrame Converting a Dask DataFrame into a pandas DataFrame Dask Bags Creating a Dask Bag using Python iterable items Creating a Dask Bag using a text file Storing a Dask Bag in a text file Storing a Dask Bag in a DataFrame Dask Delayed Preprocessing data at scale Feature scaling in Dask Feature encoding in Dask Machine learning at scale Parallel computing using scikit-learn Reimplementing ML algorithms for Dask Logistic regression Clustering Summary Other Books You May Enjoy Index Reinforce your understanding of data science and data analysis from a statistical perspective to extract meaningful insights from your data using Python programmingKey FeaturesWork your way through the entire data analysis pipeline with statistics concerns in mind to make reasonable decisionsUnderstand how various data science algorithms functionBuild a solid foundation in statistics for data science and machine learning using Python-based examplesBook DescriptionStatistics remain the backbone of modern analysis tasks, helping you to interpret the results produced by data science pipelines. This book is a detailed guide covering the math and various statistical methods required for undertaking data science tasks. The book starts by showing you how to preprocess data and inspect distributions and correlations from a statistical perspective. You'll then get to grips with the fundamentals of statistical analysis and apply its concepts to real-world datasets. As you advance, you'll find out how statistical concepts emerge from different stages of data science pipelines, understand the summary of datasets in the language of statistics, and use it to build a solid foundation for robust data products such as explanatory models and predictive models. Once you've uncovered the working mechanism of data science algorithms, you'll cover essential concepts for efficient data collection, cleaning, mining, visualization, and analysis. Finally, you'll implement statistical methods in key machine learning tasks such as classification, regression, tree-based methods, and ensemble learning. By the end of this Essential Statistics for Non-STEM Data Analysts book, you'll have learned how to build and present a self-contained, statistics-backed data product to meet your business goals.What you will learnFind out how to grab and load data into an analysis environmentPerform descriptive analysis to extract meaningful summaries from dataDiscover probability, parameter estimation, hypothesis tests, and experiment design best practicesGet to grips with resampling and bootstrapping in PythonDelve into statistical tests with variance analysis, time series analysis, and A/B test examplesUnderstand the statistics behind popular machine learning algorithmsAnswer questions on statistics for data scientist interviewsWho this book is forThis book is an entry-level guide for data science enthusiasts, data analysts, and anyone starting out in the field of data science and looking to learn the essential statistical concepts with the help of simple explanations and examples. If you're a developer or student with a non-mathematical background, you'll find this book useful. Working knowledge of the Python programming language is required. Understand data analysis concepts to make accurate decisions based on data using Python programming and Jupyter NotebookKey FeaturesFind out how to use Python code to extract insights from data using real-world examplesWork with structured data and free text sources to answer questions and add value using dataPerform data analysis from scratch with the help of clear explanations for cleaning, transforming, and visualizing dataBook DescriptionData literacy is the ability to read, analyze, work with, and argue using data. Data analysis is the process of cleaning and modeling your data to discover useful information. This book combines these two concepts by sharing proven techniques and hands-on examples so that you can learn how to communicate effectively using data.After introducing you to the basics of data analysis using Jupyter Notebook and Python, the book will take you through the fundamentals of data. Packed with practical examples, this guide will teach you how to clean, wrangle, analyze, and visualize data to gain useful insights, and you'll discover how to answer questions using data with easy-to-follow steps.Later chapters teach you about storytelling with data using charts, such as histograms and scatter plots. As you advance, you'll understand how to work with unstructured data using natural language processing (NLP) techniques to perform sentiment analysis. All the knowledge you gain will help you discover key patterns and trends in data using real-world examples. In addition to this, you will learn how to handle data of varying complexity to perform efficient data analysis using modern Python libraries.By the end of this book, you'll have gained the practical skills you need to analyze data with confidence.What you will learnUnderstand the importance of data literacy and how to communicate effectively using dataFind out how to use Python packages such as NumPy, pandas, Matplotlib, and the Natural Language Toolkit (NLTK) for data analysisWrangle data and create DataFrames using pandasProduce charts and data visualizations using time-series datasetsDiscover relationships and how to join data together using SQLUse NLP techniques to work with unstructured data to create sentiment analysis modelsDiscover patterns in real-world datasets that provide accurate insightsWho this book is forThis book is for aspiring data analysts and data scientists looking for hands-on tutorials and real-world examples to understand data analysis concepts using SQL, Python, and Jupyter Notebook. Anyone looking to evolve their skills to become data-driven personally and professionally will also find this book useful. No prior knowledge of data analysis or programming is required to get started with this book. Understand data analysis pipelines using machine learning algorithms and techniques with this practical guide Key Features Prepare and clean your data to use it for exploratory analysis, data manipulation, and data wrangling Discover supervised, unsupervised, probabilistic, and Bayesian machine learning methods Get to grips with graph processing and sentiment analysis Book Description Data analysis enables you to generate value from small and big data by discovering new patterns and trends, and Python is one of the most popular tools for analyzing a wide variety of data. With this book, you'll get up and running using Python for data analysis by exploring the different phases and methodologies used in data analysis and learning how to use modern libraries from the Python ecosystem to create efficient data pipelines. Starting with the essential statistical and data analysis fundamentals using Python, you'll perform complex data analysis and modeling, data manipulation, data cleaning, and data visualization using easy-to-follow examples. You'll then understand how to conduct time series analysis and signal processing using ARMA models. As you advance, you'll get to grips with smart processing and data analytics using machine learning algorithms such as regression, classification, Principal Component Analysis (PCA), and clustering. In the concluding chapters, you'll work on real-world examples to analyze textual and image data using natural language processing (NLP) and image analytics techniques, respectively. Finally, the book will demonstrate parallel computing using Dask. By the end of this data analysis book, you'll be equipped with the skills you need to prepare data for analysis and create meaningful data visualizations for forecasting values from data. What you will learn Explore data science and its various process models Perform data manipulation using NumPy and pandas for aggregating, cleaning, and handling missing values Create interactive visualizations using Matplotlib, Seaborn, and Bokeh Retrieve, process, and store data in a wide range of formats Understand data preprocessing and feature engineering using pandas and scikit-learn Perform time series analysis and signal processing using sunspot cycle data Analyze textual data and image data to perform advanced analysis Get up to speed with parallel computing using Dask Who this book is for This book is for data analysts, business analysts, statisticians, and data scientists looking .. Reinforce your understanding of data science and data analysis from a statistical perspective to extract meaningful insights from your data using Python programming Key Features Work your way through the entire data analysis pipeline with statistics concerns in mind to make reasonable decisions Understand how various data science algorithms function Build a solid foundation in statistics for data science and machine learning using Python-based examples Book Description Statistics remain the backbone of modern analysis tasks, helping you to interpret the results produced by data science pipelines. This book is a detailed guide covering the math and various statistical methods required for undertaking data science tasks. The book starts by showing you how to preprocess data and inspect distributions and correlations from a statistical perspective. You'll then get to grips with the fundamentals of statistical analysis and apply its concepts to real-world datasets. As you advance, you'll find out how statistical concepts emerge from different stages of data science pipelines, understand the summary of datasets in the language of statistics, and use it to build a solid foundation for robust data products such as explanatory models and predictive models. Once you've uncovered the working mechanism of data science algorithms, you'll cover essential concepts for efficient data collection, cleaning, mining, visualization, and analysis. Finally, you'll implement statistical methods in key machine learning tasks such as classification, regression, tree-based methods, and ensemble learning. By the end of this Essential Statistics for Non-STEM Data Analysts book, you'll have learned how to build and present a self-contained, statistics-backed data product to meet your business goals. What you will learn Find out how to grab and load data into an analysis environment Perform descriptive analysis to extract meaningful summaries from data Discover probability, parameter estimation, hypothesis tests, and experiment design best practices Get to grips with resampling and bootstrapping in Python Delve into statistical tests with variance analysis, time series analysis, and A/B test examples Understand the statistics behind popular machine learning algorithms Answer questions on statistics for data scientist interviews Who this book is for This book is an entry-level guide for data science enthusiasts, data analysts, and anyone starting out in the field of data sc.. Data analysis enables one to generate value from small and big data by discovering new patterns and trends. Python is a popular tool for analyzing a wide variety of data. This books instructs how to get up and running with using Python for data analysis by exploring the different phases and methodologies used in data analysis and learning how to use modern libraries from the Python ecosystem to create efficient data pipelines The book will take you on a journey through the evolution of data analysis explaining each step in the process in a very simple and easy to understand manner. You will learn how to use various Python libraries to work with data. Learn how to sift through the many different types of data, clean it, and analyze it to gain useful insights. Put your data science knowledge to work with this practical guide to statistics. You'll understand the working mechanism of each method used and find out how data science algorithms function. This book will help you learn the statistical techniques required for key model building and functioning using Python.

دانلود کتاب Python Data Analysis: Perform data collection, data processing, wrangling, visualization, and model building using Python, 3rd Edition