Applied Machine Learning

معرفی کتاب «Applied Machine Learning» نوشتهٔ David Forsyth، منتشرشده توسط نشر Springer International Publishing : Imprint: Springer در سال 2019. این کتاب در فرمت pdf، زبان انگلیسی ارائه شده است. «Applied Machine Learning» در دستهٔ بدون دسته‌بندی قرار دارد.

Machine learning methods are now an important tool for scientists, researchers, engineers and students in a wide range of areas. This book is written for people who want to adopt and use the main tools of machine learning, but aren’t necessarily going to want to be machine learning researchers. Intended for students in final year undergraduate or first year graduate computer science programs in machine learning, this textbook is a machine learning toolkit. Applied Machine Learning covers many topics for people who want to use machine learning processes to get things done, with a strong emphasis on using existing tools and packages, rather than writing one’s own code. A companion to the author's Probability and Statistics for Computer Science, this book picks up where the earlier book left off (but also supplies a summary of probability that the reader can use). Emphasizing the usefulness of standard machinery from applied statistics, this textbook gives an overview of the major applied areas in learning, including coverage of: • classification using standard machinery (naive bayes; nearest neighbor; SVM) • clustering and vector quantization (largely as in PSCS) • PCA (largely as in PSCS) • variants of PCA (NIPALS; latent semantic analysis; canonical correlation analysis) • linear regression (largely as in PSCS) • generalized linear models including logistic regression • model selection with Lasso, elasticnet • robustness and m-estimators • Markov chains and HMM’s (largely as in PSCS) • EM in fairly gory detail; long experience teaching this suggests one detailed example is required, which students hate; but once they’ve been through that, the next one is easy • simple graphical models (in the variational inference section) • classification with neural networks, with a particular emphasis onimage classification • autoencoding with neural networks • structure learning Preface......Page 3 Contents......Page 8 Classification......Page 17 1 Learning to Classify......Page 18 1.1.1 The Error Rate and Other Summaries of Performance......Page 19 1.1.2 More Detailed Evaluation......Page 20 1.1.3 Overfitting and Cross-Validation......Page 21 1.2 Classifying with Nearest Neighbors......Page 22 1.2.1 Practical Considerations for Nearest Neighbors......Page 23 1.3 Naive Bayes......Page 25 1.3.1 Cross-Validation to Choose a Model......Page 28 1.3.2 Missing Data......Page 30 1.4.2 Remember These Facts......Page 31 1.4.4 Be Able to......Page 32 2.1 The Support Vector Machine......Page 35 2.1.1 The Hinge Loss......Page 36 2.1.2 Regularization......Page 38 2.1.3 Finding a Classifier with Stochastic Gradient Descent......Page 39 2.1.4 Searching for λ......Page 41 2.1.5 Summary: Training with Stochastic Gradient Descent......Page 43 2.1.6 Example: Adult Income with an SVM......Page 44 2.1.7 Multiclass Classification with SVMs......Page 47 2.2 Classifying with Random Forests......Page 48 2.2.1 Building a Decision Tree......Page 49 2.2.2 Choosing a Split with Information Gain......Page 52 2.2.4 Building and Evaluating a Decision Forest......Page 55 2.2.5 Classifying Data Items with a Decision Forest......Page 56 2.3.2 Remember These Facts......Page 58 2.3.4 Be Able to......Page 59 3.1 Held-Out Loss Predicts Test Loss......Page 63 3.1.1 Sample Means and Expectations......Page 64 3.1.3 A Generalization Bound......Page 66 3.2 Test and Training Error for a Classifier from a Finite Family......Page 67 3.2.1 Hoeffding's Inequality......Page 68 3.2.2 Test from Training for a Finite Family of Predictors......Page 69 3.2.3 Number of Examples Required......Page 70 3.3.1 Predictors and Binary Functions......Page 71 3.3.2 Symmetrization......Page 75 3.3.3 Bounding the Generalization Error......Page 76 3.4.2 Remember These Facts......Page 78 3.4.3 Be Able to......Page 79 High Dimensional Data......Page 80 4.1 Summaries and Simple Plots......Page 81 4.1.2 Stem Plots and Scatterplot Matrices......Page 82 4.1.3 Covariance......Page 84 4.1.4 The Covariance Matrix......Page 86 4.2.1 The Curse: Data Isn't Where You Think It Is......Page 89 4.2.2 Minor Banes of Dimension......Page 90 4.3 Using Mean and Covariance to Understand High Dimensional Data......Page 91 4.3.1 Mean and Covariance Under Affine Transformations......Page 92 4.3.2 Eigenvectors and Diagonalization......Page 93 4.3.3 Diagonalizing Covariance by Rotating Blobs......Page 94 4.4 The Multivariate Normal Distribution......Page 95 4.4.1 Affine Transformations and Gaussians......Page 96 4.4.2 Plotting a 2D Gaussian: Covariance Ellipses......Page 97 4.4.3 Descriptive Statistics and Expectations......Page 98 4.4.4 More from the Curse of Dimension......Page 99 4.5.2 Remember These Facts......Page 100 4.5.3 Remember These Procedures......Page 101 5.1.1 Approximating Blobs......Page 104 5.1.2 Example: Transforming the Height–Weight Blob......Page 105 5.1.3 Representing Data on Principal Components......Page 107 5.1.4 The Error in a Low Dimensional Representation......Page 109 5.1.5 Extracting a Few Principal Components with NIPALS......Page 110 5.1.6 Principal Components and Missing Values......Page 112 5.1.7 PCA as Smoothing......Page 114 5.2 Example: Representing Colors with Principal Components......Page 116 5.3 Example: Representing Faces with Principal Components......Page 120 5.4.4 Be Able to......Page 122 6.1 The Singular Value Decomposition......Page 127 6.1.1 SVD and PCA......Page 129 6.1.3 Smoothing with the SVD......Page 130 6.2.1 Choosing Low D Points Using High D Distances......Page 132 6.2.2 Using a Low Rank Approximation to Factor......Page 133 6.2.3 Example: Mapping with Multidimensional Scaling......Page 134 6.3 Example: Text Models and Latent Semantic Analysis......Page 136 6.3.1 The Cosine Distance......Page 137 6.3.2 Smoothing Word Counts......Page 138 6.3.4 Obtaining the Meaning of Words......Page 140 6.3.5 Example: Mapping NIPS Words......Page 143 6.3.6 TF-IDF......Page 144 6.4.4 Be Able to......Page 146 7.1 Canonical Correlation Analysis......Page 149 7.2 Example: CCA of Words and Pictures......Page 152 7.3 Example: CCA of Albedo and Shading......Page 154 7.3.1 Are Correlations Significant?......Page 158 7.4.4 Be Able to......Page 160 Clustering......Page 162 8.1 Agglomerative and Divisive Clustering......Page 163 8.1.1 Clustering and Distance......Page 165 8.2 The k-Means Algorithm and Variants......Page 167 8.2.1 How to Choose k......Page 171 8.2.2 Soft Assignment......Page 172 8.2.3 Efficient Clustering and Hierarchical k-Means......Page 174 8.2.5 Example: Groceries in Portugal......Page 175 8.2.6 General Comments on k-Means......Page 178 8.3 Describing Repetition with Vector Quantization......Page 179 8.3.1 Vector Quantization......Page 180 8.3.2 Example: Activity from Accelerometer Data......Page 183 8.4.3 Remember These Procedures......Page 186 9.1 Mixture Models and Clustering......Page 191 9.1.1 A Finite Mixture of Blobs......Page 192 9.1.2 Topics and Topic Models......Page 193 9.2 The EM Algorithm......Page 196 9.2.1 Example: Mixture of Normals: The E-step......Page 197 9.2.2 Example: Mixture of Normals: The M-step......Page 199 9.2.3 Example: Topic Model: The E-step......Page 200 9.2.5 EM in Practice......Page 201 9.3.4 Be Able to......Page 206 Regression......Page 211 10.1 Overview......Page 212 10.1.1 Regression to Spot Trends......Page 213 10.2 Linear Regression and Least Squares......Page 215 10.2.1 Linear Regression......Page 216 10.2.2 Choosing β......Page 217 10.2.4 R-squared......Page 219 10.2.5 Transforming Variables......Page 221 10.2.6 Can You Trust Your Regression?......Page 224 10.3 Visualizing Regressions to Find Problems......Page 225 10.3.1 Problem Data Points Have Significant Impact......Page 226 10.3.2 The Hat Matrix and Leverage......Page 229 10.3.3 Cook's Distance......Page 230 10.3.4 Standardized Residuals......Page 231 10.4 Many Explanatory Variables......Page 232 10.4.2 Regularizing Linear Regressions......Page 234 10.4.3 Example: Weight Against Body Measurements......Page 239 10.5.3 Remember These Procedures......Page 243 10.5.4 Be Able to......Page 244 11.1 Model Selection: Which Model Is Best?......Page 252 11.1.1 Bias and Variance......Page 253 11.1.2 Choosing a Model Using Penalties: AIC and BIC......Page 255 11.1.3 Choosing a Model Using Cross-Validation......Page 257 11.1.4 Greedy Search with Stagewise Regression......Page 258 11.1.5 What Variables Are Important?......Page 259 11.2 Robust Regression......Page 260 11.2.1 M-Estimators and Iteratively Reweighted Least Squares......Page 261 11.2.2 Scale for M-Estimators......Page 264 11.3.1 Logistic Regression......Page 265 11.3.2 Multiclass Logistic Regression......Page 267 11.3.3 Regressing Count Data......Page 268 11.4 L1 Regularization and Sparse Models......Page 269 11.4.1 Dropping Variables with L1 Regularization......Page 270 11.4.2 Wide Datasets......Page 274 11.4.3 Using Sparsity Penalties with Other Models......Page 277 11.5.2 Remember These Facts......Page 278 11.5.3 Remember These Procedures......Page 279 12 Boosting......Page 282 12.1.1 Example: Greedy Stagewise Linear Regression......Page 283 12.1.3 Greedy Stagewise Regression with Trees......Page 286 12.2.1 The Loss......Page 291 12.2.2 Recipe: Stagewise Reduction of Loss......Page 293 12.2.3 Example: Boosting Decision Stumps......Page 295 12.2.4 Gradient Boost with Decision Stumps......Page 296 12.2.5 Gradient Boost with Other Predictors......Page 297 12.2.6 Example: Is a Prescriber an Opiate Prescriber?......Page 298 12.2.7 Pruning the Boosted Predictor with the Lasso......Page 300 12.2.8 Gradient Boosting Software......Page 301 12.3 You Should......Page 305 12.3.3 Remember These Facts......Page 306 12.3.5 Be Able to......Page 307 Graphical Models......Page 310 13.1 Markov Chains......Page 311 13.1.1 Transition Probability Matrices......Page 315 13.1.2 Stationary Distributions......Page 317 13.1.3 Example: Markov Chain Models of Text......Page 319 13.2.1 Hidden Markov Models......Page 322 13.2.2 Picturing Inference with a Trellis......Page 323 13.2.3 Dynamic Programming for HMMs: Formalities......Page 326 13.2.4 Example: Simple Communication Errors......Page 327 13.3 Learning an HMM......Page 329 13.3.2 Learning an HMM with EM......Page 330 13.4.1 Remember These Terms......Page 335 13.4.3 Be Able to......Page 336 14.1 Graphical Models......Page 339 14.1.1 Inference and Graphs......Page 340 14.1.2 Graphical Models......Page 342 14.1.3 Learning in Graphical Models......Page 343 14.2 Conditional Random Field Models for Sequences......Page 344 14.2.1 MEMMs and Label Bias......Page 345 14.2.2 Conditional Random Field Models......Page 347 14.2.3 Learning a CRF Takes Care......Page 348 14.3.1 Representing the Model......Page 349 14.3.2 Example: Modelling a Sequence of Digits......Page 350 14.3.3 Setting Up the Learning Problem......Page 351 14.3.4 Evaluating the Gradient......Page 352 14.4.3 Be Able to......Page 354 15.1 Useful but Intractable Models......Page 357 15.1.1 Denoising Binary Images with Boltzmann Machines......Page 358 15.1.2 A Discrete Markov Random Field......Page 359 15.1.3 Denoising and Segmenting with Discrete MRFs......Page 360 15.1.4 MAP Inference in Discrete MRFs Can Be Hard......Page 363 15.2 Variational Inference......Page 364 15.2.1 The KL Divergence......Page 365 15.2.2 The Variational Free Energy......Page 366 15.3 Example: Variational Inference for Boltzmann Machines......Page 367 15.4.3 Be Able to......Page 370 Deep Networks......Page 371 16.1 Units and Classification......Page 372 16.1.1 Building a Classifier out of Units: The Cost Function......Page 373 16.1.2 Building a Classifier out of Units: Strategy......Page 374 16.1.3 Building a Classifier out of Units: Training......Page 375 16.2 Example: Classifying Credit Card Accounts......Page 377 16.3.1 Stacking Layers......Page 382 16.3.2 Jacobians and the Gradient......Page 384 16.3.3 Setting up Multiple Layers......Page 385 16.3.4 Gradients and Backpropagation......Page 386 16.4 Training Multilayer Networks......Page 388 16.4.1 Software Environments......Page 390 16.4.2 Dropout and Redundant Units......Page 391 16.4.3 Example: Credit Card Accounts Revisited......Page 392 16.4.4 Advanced Tricks: Gradient Scaling......Page 395 16.5.3 Remember These Procedures......Page 399 16.5.4 Be Able to......Page 400 17.1 Image Classification......Page 404 17.1.1 Pattern Detection by Convolution......Page 406 17.1.2 Convolutional Layers upon Convolutional Layers......Page 412 17.2 Two Practical Image Classifiers......Page 413 17.2.1 Example: Classifying MNIST......Page 415 17.2.2 Example: Classifying CIFAR-10......Page 417 17.2.3 Quirks: Adversarial Examples......Page 423 17.3.5 Be Able to......Page 425 18.1 Image Classification......Page 427 18.1.1 Datasets for Classifying Images of Objects......Page 428 18.1.2 Datasets for Classifying Images of Scenes......Page 430 18.1.3 Augmentation and Ensembles......Page 431 18.1.4 AlexNet......Page 432 18.1.5 VGGNet......Page 434 18.1.6 Batch Normalization......Page 436 18.1.7 Computation Graphs......Page 437 18.1.8 Inception Networks......Page 438 18.1.9 Residual Networks......Page 440 18.2.1 How Object Detectors Work......Page 442 18.2.2 Selective Search......Page 444 18.2.3 R-CNN, Fast R-CNN and Faster R-CNN......Page 445 18.2.4 YOLO......Page 447 18.2.5 Evaluating Detectors......Page 449 18.3 Further Reading......Page 451 18.4.2 Remember These Facts......Page 453 18.4.3 Be Able to......Page 454 19.1 Better Low Dimensional Maps......Page 458 19.1.1 Sammon Mapping......Page 459 19.1.2 T-SNE......Page 460 19.2 Maps That Make Low-D Representations......Page 463 19.2.1 Encoders, Decoders, and Autoencoders......Page 464 19.2.2 Making Data Blocks Bigger......Page 465 19.2.3 The Denoising Autoencoder......Page 468 19.3 Generating Images from Examples......Page 472 19.3.1 Variational Autoencoders......Page 473 19.3.2 Adversarial Losses: Fooling a Classifier......Page 474 19.3.3 Matching Distributions with Test Functions......Page 476 19.3.4 Matching Distributions by Looking at Distances......Page 477 19.4.1 Remember These Terms......Page 478 19.4.3 Be Able to......Page 479 Index......Page 482 Procedures......Page 487 Worked Examples......Page 488 Remember this......Page 489 Machine learning methods are now an important tool for scientists, researchers, engineers and students in a wide range of areas. This book is written for people who want to adopt and use the main tools of machine learning, but aren't necessarily going to want to be machine learning researchers. Intended for students in final year undergraduate or first year graduate computer science programs in machine learning, this textbook is a machine learning toolkit. Applied Machine Learning covers many topics for people who want to use machine learning processes to get things done, with a strong emphasis on using existing tools and packages, rather than writing one's own code. A companion to the author's Probability and Statistics for Computer Science, this book picks up where the earlier book left off (but also supplies a summary of probability that the reader can use). Emphasizing the usefulness of standard machinery from applied statistics, this textbook gives an overview of the major applied areas in learning Covers the ideas in machine learning that everyone going to use learning tools should know, whatever their chosen specialty or career. Broad coverage of the area ensures enough to get the reader started, and to realize that it's worth knowing more in-depth knowledge of the topic. Practical approach emphasizes using existing tools and packages quickly, with enough pragmatic material on deep networks to get the learner started without needing to study other material

دانلود کتاب Applied Machine Learning