Online Machine Learning: A Practical Guide with Examples in Python (Machine Learning: Foundations, Methodologies, and Applications)
معرفی کتاب «Online Machine Learning: A Practical Guide with Examples in Python (Machine Learning: Foundations, Methodologies, and Applications)» نوشتهٔ Thomas Bartz-Beielstein, Eva Bartz, (eds.)، منتشرشده توسط نشر Springer International Publishing در سال 2024. این کتاب در فرمت pdf، زبان انگلیسی ارائه شده است. «Online Machine Learning: A Practical Guide with Examples in Python (Machine Learning: Foundations, Methodologies, and Applications)» در دستهٔ بدون دستهبندی قرار دارد.
This book deals with the exciting, seminal topic of Online Machine Learning (OML). The content is divided into three parts: the first part looks in detail at the theoretical foundations of OML, comparing it to Batch Machine Learning (BML) and discussing what criteria should be developed for a meaningful comparison. The second part provides practical considerations, and the third part substantiates them with concrete practical applications. The book is equally suitable as a reference manual for experts dealing with OML, as a textbook for beginners who want to deal with OML, and as a scientific publication for scientists dealing with OML since it reflects the latest state of research. But it can also serve as quasi OML consulting since decision-makers and practitioners can use the explanations to tailor OML to their needs and use it for their application and ask whether the benefits of OML might outweigh the costs. OML will soon become practical; it is worthwhile to get involved with it now. This book already presents some tools that will facilitate the practice of OML in the future. A promising breakthrough is expected because practice shows that due to the large amounts of data that accumulate, the previous BML is no longer sufficient. OML is the solution to evaluate and process data streams in real-time and deliver results that are relevant for practice.In addition to this book, interactive Jupyter Notebooks and further material about OML are provided in the GitHub repository (https://github.com/sn-code-inside/online-machine-learning). The repository is continuously maintained, so the notebooks may change over time. Foreword Preface Contents Contributors 1 Introduction: From Batch to Online Machine Learning 1.1 Streaming Data 1.2 Disadvantages of Batch Learning 1.2.1 Memory Requirements 1.2.2 Drift 1.2.3 New, Unknown Data 1.2.4 Accessibility and Availability of the Data 1.2.5 Other Problems 1.3 Incremental Learning, Online Learning, and Stream Learning 1.4 Transitioning Batch to Online Machine Learning References 2 Supervised Learning: Classification and Regression 2.1 Classification 2.1.1 Baseline Algorithms 2.1.2 The Naive-Bayes Classifier 2.1.3 Tree-Based Methods 2.1.4 Other Classification Methods 2.2 Regression 2.2.1 Online Linear Regression 2.2.2 Hoeffding Tree Regressor 2.3 Ensemble Methods for OML 2.4 Clustering 2.5 Overview: OML Methods References 3 Drift Detection and Handling 3.1 Architectures for Drift Detection Methods 3.1.1 Adaptive Estimators 3.1.2 Change Detectors 3.1.3 Ensemble-Based Approaches 3.2 Basic Considerations for Windowing Techniques 3.3 Popular Drift Detection Methods 3.3.1 Statistical Tests for Drift and Change Detection 3.3.2 Control Charts 3.3.3 Adaptive Windowing (ADWIN) 3.3.4 Implicit Drift Detection Algorithms 3.4 OML Algorithms with Drift Detection: Hoeffding-Window Trees 3.4.1 Concept-Adapting Very Fast Decision Trees (CVFDT) 3.4.2 Hoeffding Adaptive Trees (HAT) 3.4.3 Overview: Hoeffding-Window Trees 3.4.4 Overview: HT in River 3.5 Drift Scaling in Online Machine Learning 3.5.1 Statistical Measures in a Sequential Manner 3.5.2 Adapted Scaling Techniques References 4 Initial Selection and Subsequent Updating of OML Models 4.1 Initial Model Selection 4.2 Updating and Changing the Model 4.2.1 Adding New Features 4.2.2 Manual Model Changes in Response to Drift 4.2.3 Ensuring Model Quality After a Model Update 4.3 Catastrophic Forgetting 4.3.1 Strategies for Dealing with Catastrophic Forgetting References 5 Evaluation and Performance Measurement 5.1 Data Selection Methods 5.1.1 Holdout Selection 5.1.2 Progressive Validation: Interleaved Test-Then-Train 5.1.3 Machine Learning in Batch Mode with a Prediction Horizon 5.1.4 Landmark Batch Machine Learning with a Prediction Horizon 5.1.5 Window-Batch Method with Prediction Horizon 5.1.6 Online-Machine Learning with a Prediction Horizon 5.1.7 Online-Maschine Learning 5.2 Determining the Training and Test Data Set in the Package spotRiver 5.2.1 Methods for BML und OML 5.2.2 Methods for OML River 5.3 Algorithm (Model) Performance 5.4 Data Stream and Drift Generators 5.4.1 Data Stream Generators in Sklearn 5.4.2 SEA-Drift Generator 5.4.3 Friedman-Drift Generator 5.5 Summary References 6 Special Requirements for Online Machine Learning Methods 6.1 Missing Data, Imputation 6.2 Categorical Attributes 6.3 Outlier and Anomaly Detection 6.3.1 Additional Anomaly Detection Methods for Time-Series Data 6.3.2 One-Class SVM for Anomaly Detection 6.3.3 Algorithms for Anomaly Detection in river 6.4 Imbalanced Data 6.5 Large Number of Features (Attributes) 6.6 FAIR, Interpretability, and Explainability References 7 Practical Applications of Online Machine Learning 7.1 Applications and Application Perspectives in Official Statistics 7.1.1 Potentials and Challenges 7.1.2 Compatibility with Quality Criteria 7.1.3 Embedding in the Statistics Production Process 7.1.4 (Online) Machine Learning Applications in Statistical Institutions 7.1.5 Other Applications with Reference to Official Statistics 7.1.6 Summary: OML in Official Statistics 7.2 Industrial Application of OML in the Context of Hot Rolling 7.2.1 Hot Rolling 7.2.2 Machine Learning in Hot Rolling 7.2.3 Drift in Hot Rolling 7.2.4 Application of OML in Hot Rolling 7.2.5 Summary: OML in Hot Rolling 7.3 Summary: Aspects of OML Implementation in Practice 7.3.1 Recommendations for the Implementation Process 7.3.2 Expenditure for Implementation and Maintenance 7.3.3 Application and Diffusion in Practice 7.3.4 Overall Conclusions References 8 Open-Source Software for Online Machine Learning 8.1 Overview and Description of Software Packages for Online Machine Learning 8.1.1 MOA 8.1.2 RMOA 8.1.3 Stream 8.1.4 River 8.2 Scope of the Software Packages 8.3 Programming Languages: A Brief Comparison References 9 An Experimental Comparison of Batch and Online Machine Learning Algorithms 9.1 Study: Bike Sharing 9.1.1 Overview: Models 9.1.2 Linear Regression 9.1.3 Gradient Boosting 9.1.4 Hoeffding Regression Trees 9.1.5 Final Comparison of the Bike-Sharing Experiments 9.1.6 Summary: Bike-Sharing Experiments 9.2 Study: Very Large Data Sets With Drift 9.2.1 The Friedman-Drift Data Set 9.2.2 Algorithms 9.2.3 Results 9.3 Study: Drift Scaling in Online Machine Learning 9.4 Summary References 10 Hyperparameter Tuning 10.1 Hyperparameter Tuning: An Introduction 10.2 The Hyperparameter-Tuning-Software SPOT 10.3 Study: Hyperparameter Tuning of the HATR Algorithm on the Friedman-Drift Data 10.3.1 Loading the Data 10.3.2 Specification of the Preprocessing Model 10.3.3 Selection of the Algorithm to be Tuned and the Default Hyperparameters 10.3.4 Modification of the Default Values for the Hyperparameters 10.3.5 Selection of the Target Function (Loss Function) 10.3.6 Calling the Hyperparameter Tuner SPOT 10.3.7 Visualization with TensorBoard 10.3.8 hatr Tuning Results 10.3.9 Explainability and Understanding 10.4 Summary References 11 Summary and Outlook 11.1 Necessity for OML Methods 11.2 Recommendations for Using OML in Practice References Appendix A Definitions and Explanations A.1 Gradient Descent A.2 Bayes' Theorem A.3 Hoeffding Bound A.4 Kappa Statistics Appendix B Supplementary Materials B.1 Notebooks B.2 Software Appendix Glossary Index This book deals with the exciting, seminal topic of Online Machine Learning (OML). The content is divided into three the first part looks in detail at the theoretical foundations of OML, comparing it to Batch Machine Learning (BML) and discussing what criteria should be developed for a meaningful comparison. The second part provides practical considerations, and the third part substantiates them with concrete practical applications. The book is equally suitable as a reference manual for experts dealing with OML, as a textbook for beginners who want to deal with OML, and as a scientific publication for scientists dealing with OML since it reflects the latest state of research. But it can also serve as quasi OML consulting since decision-makers and practitioners can use the explanations to tailor OML to their needs and use it for their application and ask whether the benefits of OML might outweigh the costs. OML will soon become practical; it is worthwhile to get involved with it now. This book already presents some tools that will facilitate the practice of OML in the future. A promising breakthrough is expected because practice shows that due to the large amounts of data that accumulate, the previous BML is no longer sufficient. OML is the solution to evaluate and process data streams in real-time and deliver results that are relevant for practice. In addition to this book, interactive Jupyter Notebooks and further material about OML are provided in the GitHub repository( The repository is continuously maintained, so the notebooks may change over time.
دانلود کتاب Online Machine Learning: A Practical Guide with Examples in Python (Machine Learning: Foundations, Methodologies, and Applications)