وبلاگ بلیان

Python Feature Engineering Cookbook, 3rd Edition

معرفی کتاب «Python Feature Engineering Cookbook, 3rd Edition» نوشتهٔ Soledad Galli، منتشرشده توسط نشر Packt Publishing در سال 2024. این کتاب در فرمت pdf، زبان انگلیسی ارائه شده است. «Python Feature Engineering Cookbook, 3rd Edition» در دستهٔ بدون دسته‌بندی قرار دارد.

Leverage the power of Python to build real-world feature engineering and machine learning pipelines ready to be deployed to productionKey FeaturesCraft powerful features from tabular, transactional, and time-series dataDevelop efficient and reproducible real-world feature engineering pipelinesOptimize data transformation and save valuable timePurchase of the print or Kindle book includes a free PDF eBookBook DescriptionStreamline data preprocessing and feature engineering in your machine learning project with this third edition of the Python Feature Engineering Cookbook to make your data preparation more efficient. This guide addresses common challenges, such as imputing missing values and encoding categorical variables using practical solutions and open source Python libraries. You'll learn advanced techniques for transforming numerical variables, discretizing variables, and dealing with outliers. Each chapter offers step-by-step instructions and real-world examples, helping you understand when and how to apply various transformations for well-prepared data. The book explores feature extraction from complex data types such as dates, times, and text. You'll see how to create new features through mathematical operations and decision trees and use advanced tools like Featuretools and tsfresh to extract features from relational data and time series. By the end, you'll be ready to build reproducible feature engineering pipelines that can be easily deployed into production, optimizing data preprocessing workflows and enhancing machine learning model performance.What you will learnDiscover multiple methods to impute missing data effectivelyEncode categorical variables while tackling high cardinalityFind out how to properly transform, discretize, and scale your variablesAutomate feature extraction from date and time dataCombine variables strategically to create new and powerful featuresExtract features from transactional data and time seriesLearn methods to extract meaningful features from text dataWho this book is forIf you're a machine learning or data science enthusiast who wants to learn more about feature engineering, data preprocessing, and how to optimize these tasks, this book is for you. If you already know the basics of feature engineering and are looking to learn more advanced methods to craft powerful features, this book will help you. You should have basic knowledge of Python programming and machine learning to get started. Cover Title page Copyright and credits Foreword Contributors Table of Contents Preface Chapter 1: Imputing Missing Data Technical requirements Removing observations with missing data How to do it... How it works... See also Performing mean or median imputation How to do it... How it works... Imputing categorical variables How to do it... How it works... Replacing missing values with an arbitrary number How to do it... How it works... Finding extreme values for imputation How to do it... How it works... Marking imputed values How to do it... How it works... There’s more... Implementing forward and backward fill How to do it... How it works... Carrying out interpolation How to do it... How it works... See also Performing multivariate imputation by chained equations How to do it... How it works... See also Estimating missing data with nearest neighbors How to do it... How it works... Chapter 2: Encoding Categorical Variables Technical requirements Creating binary variables through one-hot encoding How to do it... How it works... There’s more... Performing one-hot encoding of frequent categories How to do it... How it works... There’s more... Replacing categories with counts or the frequency of observations How to do it... How it works... See also Replacing categories with ordinal numbers How to do it... How it works... There’s more... Performing ordinal encoding based on the target value How to do it... How it works... See also Implementing target mean encoding How to do it... How it works... There’s more... Encoding with Weight of Evidence How to do it... How it works... See also Grouping rare or infrequent categories How to do it... How it works... Performing binary encoding How to do it... How it works... Chapter 3: Transforming Numerical Variables Transforming variables with the logarithm function Getting ready How to do it... How it works... There’s more... Transforming variables with the reciprocal function How to do it... How it works... Using the square root to transform variables How to do it... How it works... Using power transformations How to do it... How it works... Performing Box-Cox transformations How to do it... How it works... There’s more... Performing Yeo-Johnson transformations How to do it... How it works... There’s more... Chapter 4: Performing Variable Discretization Technical requirements Performing equal-width discretization How to do it... How it works... See also Implementing equal-frequency discretization How to do it... How it works... Discretizing the variable into arbitrary intervals How to do it... How it works... Performing discretization with k-means clustering How to do it... How it works... See also Implementing feature binarization Getting ready How to do it... How it works... Using decision trees for discretization How to do it... How it works... There’s more... Chapter 5: Working with Outliers Technical requirements Visualizing outliers with boxplots and the inter-quartile proximity rule How to do it... How it works... Finding outliers using the mean and standard deviation How to do it... How it works... Using the median absolute deviation to find outliers How to do it... How it works... Removing outliers How to do it... How it works... See also Bringing outliers back within acceptable limits How to do it... How it works... See also Applying winsorization How to do it... How it works... See also Chapter 6: Extracting Features from Date and Time Variables Technical requirements Extracting features from dates with pandas Getting ready How to do it... How it works... There’s more... See also Extracting features from time with pandas Getting ready How to do it... How it works... There’s more... Capturing the elapsed time between datetime variables How to do it... How it works... There's more... See also Working with time in different time zones How to do it... How it works... See also Automating the datetime feature extraction with Feature-engine How to do it... How it works... Chapter 7: Performing Feature Scaling Technical requirements Standardizing the features Getting ready How to do it... How it works... Scaling to the maximum and minimum values Getting ready How to do it... How it works... Scaling with the median and quantiles How to do it... How it works... Performing mean normalization How to do it... How it works... There’s more... Implementing maximum absolute scaling Getting ready How to do it... There’s more... Scaling to vector unit length How to do it... How it works... Chapter 8: Creating New Features Technical requirements Combining features with mathematical functions Getting ready How to do it... How it works... See also Comparing features to reference variables How to do it... How it works... See also Performing polynomial expansion Getting ready How to do it... How it works... There’s more... Combining features with decision trees How to do it... How it works... See also Creating periodic features from cyclical variables Getting ready How to do it... How it works... Creating spline features Getting ready How to do it... How it works... See also Chapter 9: Extracting Features from Relational Data with Featuretools Technical requirements Setting up an entity set and creating features automatically Getting ready How to do it... How it works... See also Creating features with general and cumulative operations Getting ready How to do it... How it works... Combining numerical features How to do it... How it works... Extracting features from date and time How to do it... How it works... Extracting features from text Getting ready How to do it... How it works... Creating features with aggregation primitives Getting ready How to do it... How it works... Chapter 10: Creating Features from a Time Series with tsfresh Technical requirements Extracting hundreds of features automatically from a time series Getting ready How to do it... How it works... See also Automatically creating and selecting predictive features from time-series data How to do it... How it works... See also Extracting different features from different time series How to do it... How it works... Creating a subset of features identified through feature selection How to do it... How it works... Embedding feature creation into a scikit-learn pipeline How to do it... How it works... See also Chapter 11: Extracting Features from Text Variables Technical requirements Counting characters, words, and vocabulary Getting ready How to do it... How it works... There’s more... See also Estimating text complexity by counting sentences Getting ready How to do it... How it works... There’s more... Creating features with bag-of-words and n-grams Getting ready How to do it... How it works... See also Implementing term frequency-inverse document frequency Getting ready How to do it... How it works... See also Cleaning and stemming text variables Getting ready How to do it... How it works... Index Other Books You May Enjoy
دانلود کتاب Python Feature Engineering Cookbook, 3rd Edition