Machine Learning Methods

معرفی کتاب «Machine Learning Methods» نوشتهٔ Hang Li, Lu Lin, Huanqiang Zeng، منتشرشده توسط نشر Springer در سال 2023. این کتاب در فرمت pdf، زبان انگلیسی ارائه شده است. «Machine Learning Methods» در دستهٔ بدون دسته‌بندی قرار دارد.

This book provides a comprehensive and systematic introduction to the principal machine learning methods, covering both supervised and unsupervised learning methods. It discusses essential methods of classification and regression in supervised learning, such as decision trees, perceptrons, support vector machines, maximum entropy models, logistic regression models and multiclass classification, as well as methods applied in supervised learning, like the hidden Markov model and conditional random fields. In the context of unsupervised learning, it examines clustering and other problems as well as methods such as singular value decomposition, principal component analysis and latent semantic analysis. As a fundamental book on machine learning, it addresses the needs of researchers and students who apply machine learning as an important tool in their research, especially those in fields such as information retrieval, natural language processing and text data mining. In order to understand the concepts and methods discussed, readers are expected to have an elementary knowledge of advanced mathematics, linear algebra and probability statistics. The detailed explanations of basic principles, underlying concepts and algorithms enable readers to grasp basic techniques, while the rigorous mathematical derivations and specific examples included offer valuable insights into machine learning. Preface Contents 1 Introduction to Machine Learning and Supervised Learning 1.1 Machine Learning 1.1.1 Characteristics of Machine Learning 1.1.2 The Object of Machine Learning 1.1.3 The Purpose of Machine Learning 1.1.4 Methods of Machine Learning 1.1.5 The Study of Machine Learning 1.1.6 The Importance of Machine Learning 1.2 Classification of Machine Learning 1.2.1 The Basic Classification 1.2.2 Classification by Model Types 1.2.3 Classification by Algorithm 1.2.4 Classification by Technique 1.3 Three Elements of Machine Learning Methods 1.3.1 Model 1.3.2 Strategy 1.3.3 Algorithm 1.4 Model Evaluation and Model Selection 1.4.1 Training Error and Test Error 1.4.2 Over-Fitting and Model Selection 1.5 Regularization and Cross-Validation 1.5.1 Regularization 1.5.2 Cross-Validation 1.6 Generalization Ability 1.6.1 Generalization Error 1.6.2 Generalization Error Bound 1.7 Generative Approach and Discriminative Model 1.8 Supervised Learning Application 1.8.1 Classification 1.8.2 Tagging 1.8.3 Regression References 2 Perceptron 2.1 The Perceptron Model 2.2 Perceptron Learning Strategy 2.2.1 Linear Separability of the Dataset 2.2.2 Perceptron Learning Strategy 2.3 Perceptron Learning Algorithm 2.3.1 The Primal Form of the Perceptron Learning Algorithm 2.3.2 Convergence of the Algorithm 2.3.3 The Dual Form of the Perceptron Learning Algorithm References 3 K-Nearest Neighbor 3.1 The K-Nearest Neighbor Algorithm 3.2 The K-Nearest Neighbor Model 3.2.1 Model 3.2.2 Distance Metrics 3.2.3 The Selection of k Value 3.2.4 Classification Decision Rule 3.3 Implementation of K-Nearest Neighbor: The kd-Tree 3.3.1 Constructing the kd-Tree 3.3.2 Searching for kd-Tree References 4 The Naïve Bayes Method 4.1 The Learning and Classification of Naïve Bayes 4.1.1 Basic Methods 4.1.2 Implications of Posterior Probability Maximization 4.2 Parameter Estimation of the Naïve Bayes Method 4.2.1 Maximum Likelihood Estimation 4.2.2 Learning and Classification Algorithms 4.2.3 Bayesian Estimation References 5 Decision Tree 5.1 Decision Tree Model and Learning 5.1.1 Decision Tree Model 5.1.2 Decision Tree and If-Then Rules 5.1.3 Decision Tree and Conditional Probability Distributions 5.1.4 Decision Tree Learning 5.2 Feature Selection 5.2.1 The Feature Selection Problem 5.2.2 Information Gain 5.2.3 Information Gain Ratio 5.3 Generation of Decision Tree 5.3.1 ID3 Algorithm 5.3.2 C4.5 Generation Algorithm 5.4 Pruning of Decision Tree 5.5 CART Algorithm 5.5.1 CART Generation 5.5.2 CART Pruning References 6 Logistic Regression and Maximum Entropy Model 6.1 Logistic Regression Model 6.1.1 Logistic Distribution 6.1.2 Binomial Logistic Regression Model 6.1.3 Model Parameter Estimation 6.1.4 Multi-nomial Logistic Regression 6.2 Maximum Entropy Model 6.2.1 Maximum Entropy Principle 6.2.2 Definition of Maximum Entropy Model 6.2.3 Learning of the Maximum Entropy Model 6.2.4 Maximum Likelihood Estimation 6.3 Optimization Algorithm of Model Learning 6.3.1 Improved Iterative Scaling 6.3.2 Quasi-Newton Method References 7 Support Vector Machine 7.1 Linear Support Vector Machine in the Linearly Separable Case and Hard Margin Maximization 7.1.1 Linear Support Vector Machine in the Linearly Separable Case 7.1.2 Function Margin and Geometric Margin 7.1.3 Maximum Margin 7.1.4 Dual Algorithm of Learning 7.2 Linear Support Vector Machine and Soft Margin Maximization 7.2.1 Linear Support Vector Machine 7.2.2 Dual Learning Algorithm 7.2.3 Support Vector 7.2.4 Hinge Loss Function 7.3 Non-Linear Support Vector Machine and Kernel Functions 7.3.1 Kernel Trick 7.3.2 Positive Definite Kernel 7.3.3 Commonly Used Kernel Functions 7.3.4 Nonlinear Support Vector Classifier 7.4 Sequential Minimal Optimization Algorithm 7.4.1 The Method of Solving Two-Variable Quadratic Programming 7.4.2 Selection Methods of Variables 7.4.3 SMO Algorithm References 8 Boosting 8.1 AdaBoost Algorithm 8.1.1 The Basic Idea of Boosting 8.1.2 AdaBoost Algorithm 8.1.3 AdaBoost Example 8.2 Training Error Analysis of AdaBoost Algorithm 8.3 Explanation of AdaBoost Algorithm 8.3.1 Forward Stepwise Algorithm 8.3.2 Forward Stepwise Algorithm and AdaBoost 8.4 Boosting Tree 8.4.1 Boosting Tree Model 8.4.2 Boosting Tree Algorithm 8.4.3 Gradient Boosting References 9 EM Algorithm and Its Extensions 9.1 Introduction of the EM Algorithm 9.1.1 EM Algorithm 9.1.2 Derivation of the EM Algorithm 9.1.3 Application of the EM Algorithm in Unsupervised Learning 9.2 The Convergence of the EM Algorithm 9.3 Application of the EM Algorithm in the Learning of the Gaussian Mixture Model 9.3.1 Gaussian Mixture Model 9.3.2 The EM Algorithm for Parameter Estimation of the Gaussian Mixture Model 9.4 Extensions of the EM Algorithm 9.4.1 The Maximization-Maximization Algorithm of F-Function 9.4.2 GEM Algorithm 9.5 Summary 9.6 Further Reading 9.7 Exercises References 10 Hidden Markov Model 10.1 The Basic Concept of Hidden Markov Model 10.1.1 Definition of Hidden Markov Model 10.1.2 The Generation Process of the Observation Sequence 10.1.3 Three Basic Problems of the Hidden Markov Model 10.2 Probability Calculation Algorithms 10.2.1 Direct Calculation Method 10.2.2 Forward Algorithm 10.2.3 Backward Algorithm 10.2.4 Calculation of Some Probabilities and Expected Values 10.3 Learning Algorithms 10.3.1 Supervised Learning Methods 10.3.2 Baum-Welch Algorithm 10.3.3 Baum-Welch Model Parameter Estimation Formula 10.4 Prediction Algorithm 10.4.1 Approximation Algorithm 10.4.2 Viterbi Algorithm References 11 Conditional Random Field 11.1 Probabilistic Undirected Graphical Model 11.1.1 Model Definition 11.1.2 Factorization of Probabilistic Undirected Graphical Model 11.2 The Definition and Forms of Conditional Random Field 11.2.1 The Definition of Conditional Random Field 11.2.2 The Parameterized Form of the Conditional Random Field 11.2.3 The Simplified Form of Conditional Random Field 11.2.4 The Matrix Form of the Conditional Random Field 11.3 The Probability Computation Problem of Conditional Random Field 11.3.1 Forward–Backward Algorithm 11.3.2 Probability Computation 11.3.3 The Computation of Expected Value 11.4 Learning Algorithms of Conditional Random Field 11.4.1 Improved Iterative Scaling 11.4.2 Quasi-Newton Method 11.5 The Prediction Algorithm of Conditional Random Field References 12 Summary of Supervised Learning Methods 12.1 Application 12.2 Models 12.3 Learning Strategies 12.4 Learning Algorithms 13 Introduction to Unsupervised Learning 13.1 The Fundamentals of Unsupervised Learning 13.2 Basic Issues 13.2.1 Clustering 13.2.2 Dimensionality Reduction 13.2.3 Probability Model Estimation 13.3 Three Elements of Machine Learning 13.4 Unsupervised Learning Methods 13.4.1 Clustering 13.4.2 Dimensionality Reduction 13.4.3 Topic Modeling 13.4.4 Graph Analytics References 14 Clustering 14.1 Basic Concepts of Clustering 14.1.1 Similarity or Distance 14.1.2 Class or Cluster 14.1.3 Distance Between Classes 14.2 Hierarchical Clustering 14.3 k-means Clustering 14.3.1 Model 14.3.2 Strategy 14.3.3 Algorithm 14.3.4 Algorithm Characteristics References 15 Singular Value Decomposition 15.1 Introduction 15.2 Definition and Properties of Singular Value Decomposition 15.2.1 Definition and Theorem 15.2.2 Compact Singular Value Decomposition and Truncated Singular Value Decomposition 15.2.3 Geometry Interpretation 15.2.4 Main Properties 15.3 Computation of Singular Value Decomposition 15.4 Singular Value Decomposition and Matrix Approximation 15.4.1 Frobenius Norm 15.4.2 Optimal Approximation of the Matrix 15.4.3 The Outer Product Expansion of Matrix References 16 Principal Component Analysis 16.1 Overall Principal Component Analysis 16.1.1 Basic Ideas 16.1.2 Definition and Derivation 16.1.3 Main Properties 16.1.4 The Number of Principal Components 16.1.5 The Overall Principal Components of Normalized Variables 16.2 Sample Principal Component Analysis 16.2.1 The Definition and Properties of the Sample Principal Components 16.2.2 Eigenvalue Decomposition Algorithm of Aorrelation Matrix 16.2.3 Singular Value Decomposition Algorithm for Data Matrix References 17 Latent Semantic Analysis 17.1 Word Vector Space and Topic Vector Space 17.1.1 Word Vector Space 17.1.2 Topic Vector Space 17.2 Latent Semantic Analysis Algorithm 17.2.1 Matrix Singular Value Decomposition Algorithm 17.2.2 Examples 17.3 Non-negative Matrix Factorization Algorithm 17.3.1 Non-negative Matrix Factorization 17.3.2 Latent Semantic Analysis Model 17.3.3 Formalization of Non-negative Matrix Factorization 17.3.4 Algorithm References 18 Probabilistic Latent Semantic Analysis 18.1 Probabilistic Latent Semantic Analysis Model 18.1.1 Basic Ideas 18.1.2 Generative Model 18.1.3 Co-occurrence Model 18.1.4 Model Properties 18.2 Algorithms for Probabilistic Latent Semantic Analysis References 19 Markov Chain Monte Carlo Method 19.1 Monte Carlo Method 19.1.1 Random Sampling 19.1.2 Mathematical Expectation Estimate 19.1.3 Integral Computation 19.2 Markov Chain 19.2.1 Basic Definition 19.2.2 Discrete-Time Markov Chain 19.2.3 Continuous-Time Markov Chain 19.2.4 Properties of Markov Chain 19.3 Markov Chain Monte Carlo Method 19.3.1 Basic Ideas 19.3.2 Basic Steps 19.3.3 Markov Chain Monte Carlo Method and Machine Learning 19.4 Metropolis–Hasting Algorithm 19.4.1 Fundamental Concepts 19.4.2 Metropolis–Hastings Algorithm 19.4.3 The Single-Component Metropolis–Hastings Algorithm 19.5 Gibbs Sampling 19.5.1 Basic Principles 19.5.2 Gibbs Sampling Algorithm 19.5.3 Sampling Computation References 20 Latent Dirichlet Allocation 20.1 Dirichlet Distribution 20.1.1 Definition of Distribution 20.1.2 Conjugate Prior 20.2 Latent Dirichlet Allocation Model 20.2.1 Basic Ideas 20.2.2 Model Definition 20.2.3 Probability Graphical Model 20.2.4 The Changeability of Random Variable Sequences 20.2.5 Probability Formula 20.3 Gibbs Sampling Algorithm for LDA 20.3.1 Basic Ideas 20.3.2 Major Parts of Algorithm 20.3.3 Algorithm Post-processing 20.3.4 Algorithm 20.4 Variational EM Algorithm for LDA 20.4.1 Variational Reasoning 20.4.2 Variational EM Algorithm 20.4.3 Algorithm Derivation 20.4.4 Algorithm Summary References 21 The PageRank Algorithm 21.1 The Definition of PageRank 21.1.1 Basic Ideas 21.1.2 The Directed Graph and Random Walk Model 21.1.3 The Basic Definition of PageRank 21.1.4 General Definition of PageRank 21.2 Computation of PageRank 21.2.1 Iterative Algorithm 21.2.2 Power Method 21.2.3 Algebraic Algorithms References 22 A Summary of Unsupervised Learning Methods 22.1 The Relationships and Characteristics of Unsupervised Learning Methods 22.1.1 The Relationships Between Various Methods 22.1.2 Unsupervised Learning Methods 22.1.3 Basic Machine Learning Methods 22.2 The Relationships and Characteristics of Topic Models References Appendix A Gradient Descent Appendix B Newton Method and Quasi-Newton Method Appendix C Language Duality Appendix D Basic Subspaces of Matrix Appendix E The Definition of KL Divergence and the Properties of Dirichlet Distribution Color Diagrams Index This book is a popular machine learning textbook and reference book in China. The first edition was published in March 2012 under the title Statistical Learning Methods focusing on supervised learning, including perceptron, k-nearest neighbor method, naive Bayes method, decision tree, logistic regression, maximum entropy model, support vector machine, boosting method, EM algorithm, hidden Markov model, and conditional random field. The second edition was published in May 2019, with additional content on unsupervised learning, including the clustering method, singular value decomposition, principal component analysis, latent semantic analysis, probabilistic latent semantic analysis, Markovchain MonteCarlo method, latent Dirichlet assignment, and PageRank Algorithm. By the end of 2021, the two editions had been printed more than 30 times and sold more than 350,000 copies.

دانلود کتاب Machine Learning Methods