Machine Learning on Commodity Tiny Devices : Theory and Practice

معرفی کتاب «Machine Learning on Commodity Tiny Devices : Theory and Practice» نوشتهٔ Song Guo, (Computer scientist); Qihua Zhou، منتشرشده توسط نشر CRC Press LLC در سال 2022. این کتاب در فرمت pdf، زبان انگلیسی ارائه شده است. «Machine Learning on Commodity Tiny Devices : Theory and Practice» در دستهٔ بدون دسته‌بندی قرار دارد.

This book aims at the tiny machine learning (TinyML) software and hardware synergy for edge intelligence applications. It presents on-device learning techniques covering model-level neural network design, algorithm-level training optimization, and hardware-level instruction acceleration. Analyzing the limitations of conventional in-cloud computing would reveal that on-device learning is a promising research direction to meet the requirements of edge intelligence applications. As to the cutting-edge research of TinyML, implementing a high-efficiency learning framework and enabling system-level acceleration is one of the most fundamental issues. This book presents a comprehensive discussion of the latest research progress and provides system-level insights on designing TinyML frameworks, including neural network design, training algorithm optimization and domain-specific hardware acceleration. It identifies the main challenges when deploying TinyML tasks in the real world and guides the researchers to deploy a reliable learning system. This volume will be of interest to students and scholars in the field of edge intelligence, especially to those with sufficient professional Edge AI skills. It will also be an excellent guide for researchers to implement high-performance TinyML systems. Cover Half Title Title Page Copyright Page Contents List of Figures List of Tables CHAPTER 1: Introduction 1.1. WHAT IS MACHINE LEARNING ON DEVICES? 1.2. ON-DEVICE LEARNING AND TINYML SYSTEMS 1.2.1. Property of On-Device Learning 1.2.2. Objectives of TinyML Systems 1.3. CHALLENGES FOR REALISTIC IMPLEMENTATION 1.4. PROBLEM STATEMENT OF BUILDING TINYML SYSTEMS 1.5. DEPLOYMENT PROSPECTS AND DOWNSTREAM APPLICATIONS 1.5.1. Evaluation Metrics for Practical Methods 1.5.2. Intelligent Medical Diagnosis 1.5.3. AI-Enhanced Motion Tracking 1.5.4. Domain-Specific Acceleration Chips 1.6. THE SCOPE AND ORGANIZATION OF THIS BOOK CHAPTER 2: Fundamentals: On-Device Learning Paradigm 2.1. MOTIVATION 2.1.1. Drawbacks of In-Cloud Learning 2.1.2. Rise of On-Device Learning 2.1.3. Bit Precision and Data Quantization 2.1.4. Potential Gains 2.1.5. Why Not Existing Quantization Methods? 2.2. BASIC TRAINING ALGORITHMS 2.2.1. Stochastic Gradient Descent 2.2.2. Mini-Batch Stochastic Gradient Descent 2.2.3. Training of Neural Networks 2.3. PARAMETER SYNCHRONIZATION FOR DISTRIBUTED TRAINING 2.3.1. Parameter Server Paradigm 2.3.2. Parameter Synchronization Pace 2.3.3. Heterogeneity-Aware Distributed Training 2.4. MULTI-CLIENT ON-DEVICE LEARNING 2.4.1. Preliminary Experiments 2.4.2. Observations 2.4.2.1. Training Convergence Efficiency 2.4.2.2. Synchronization Frequency 2.4.2.3. Communication Traffic 2.4.3. Summary 2.5. DEVELOPING KITS AND EVALUATION PLATFORMS 2.5.1. Devices 2.5.2. Benchmarks 2.5.3. Pipeline 2.6. CHAPTER SUMMARY CHAPTER 3: Preliminary: Theories and Algorithms 3.1. ELEMENTS OF NEURAL NETWORKS 3.1.1. Fully Connected Network 3.1.2. Convolutional Neural Network 3.1.3. Attention-Based Neural Network 3.2. MODEL-ORIENTED OPTIMIZATION ALGORITHMS 3.2.1. Tiny Transformer 3.2.2. Quantization Strategy for Transformer 3.3. PRACTICE ON SIMPLE CONVOLUTIONAL NEURAL NETWORKS 3.3.1. PyTorch Installation 3.3.1.1. On macOS 3.3.1.2. On Windows 3.3.2. CIFAR-10 Dataset 3.3.3. Construction of CNN Model 3.3.3.1. Convolutional Layers 3.3.3.2. Activation Layers 3.3.3.3. Pooling Layers 3.3.3.4. Fully Connected Layers 3.3.3.5. Structure of LeNet-5 3.3.4. Model Training 3.3.5. Model Testing 3.3.6. GPU Acceleration 3.3.6.1. CUDA Installation 3.3.6.2. Programming for GPU 3.3.7. Load Pre-Trained CNNs CHAPTER 4: Model-Level Design: Computation Acceleration and Communication Saving 4.1. OPTIMIZATION OF NETWORK ARCHITECTURE 4.1.1. Network-Aware Parameter Pruning 4.1.1.1. Pruning Steps 4.1.1.2. Pruning Strategy 4.1.1.3. Pruning Metrics 4.1.1.4. Summary 4.1.2. Knowledge Distillation 4.1.2.1. Combination of Loss Functions 4.1.2.2. Tuning of Hyper-Parameters 4.1.2.3. Usage of Model Training 4.1.2.4. Summary 4.1.3. Model Fine-Tuning 4.1.3.1. Transfer Learning 4.1.3.2. Layer-Wise Freezing and Updating 4.1.3.3. Model-Wise Feature Sharing 4.1.3.4. Summary 4.1.4. Neural Architecture Search 4.1.4.1. Search Space of HW-NAS 4.1.4.2. Targeted Hardware Platforms 4.1.4.3. Trend of Current HW-NAS Methods 4.2. OPTIMIZATION OF TRAINING ALGORITHM 4.2.1. Low Rank Factorization 4.2.2. Data-Adaptive Regularization 4.2.2.1. Core Formulation 4.2.2.2. On-Device Network Sparsification 4.2.2.3. Block-Wise Regularization 4.2.2.4. Summary 4.2.3. Data Representation and Numerical Quantization 4.2.3.1. Elements of Quantization 4.2.3.2. Post-Training Quantization 4.2.3.3. Quantization-Aware Training 4.2.3.4. Summary 4.3. CHAPTER SUMMARY CHAPTER 5: Hardware-Level Design: Neural Engines and Tensor Accelerators 5.1. ON-CHIP RESOURCE SCHEDULING 5.1.1. Embedded Memory Controlling 5.1.2. Underlying Computational Primitives 5.1.3. Low-Level Arithmetical Instructions 5.1.4. MIMO-Based Communication 5.2. DOMAIN-SPECIFIC HARDWARE ACCELERATION 5.2.1. Multiple Processing Primitives Scheduling 5.2.2. I/O Connection Optimization 5.2.3. Cache Management 5.2.4. Topology Construction 5.3. CROSS-DEVICE ENERGY EFFICIENCY 5.3.1. Multi-Client Collaboration 5.3.2. Efficiency Analysis 5.3.3. Problem Formulation for Energy Saving 5.3.4. Algorithm Design and Pipeline Overview 5.4. DISTRIBUTED ON-DEVICE LEARNING 5.4.1. Community-Aware Synchronous Parallel 5.4.2. Infrastructure Design 5.4.3. Community Manager 5.4.4. Weight Learner 5.4.4.1. Distance Metric Learning 5.4.4.2. Asynchronous Advantage Actor-Critic 5.4.4.3. Agent Learning Methodology 5.4.5. Distributed Training Controller 5.4.5.1. Intra-Community Synchronization 5.4.5.2. Inter-Community Synchronization 5.4.5.3. Communication Traffic Aggregation 5.5. CHAPTER SUMMARY CHAPTER 6: Infrastructure-Level Design: Serverless and Decentralized Machine Learning 6.1. SERVERLESS COMPUTING 6.1.1. Definition of Serverless Computing 6.1.2. Architecture of Serverless Computing 6.1.2.1. Virtualization Layer 6.1.2.2. Encapsulation Layer 6.1.2.3. System Orchestration Layer 6.1.2.4. System Coordination Layer 6.1.3. Benefits of Serverless Computing 6.1.4. Challenges of Serverless Computing 6.1.4.1. Programming and Modeling 6.1.4.2. Pricing and Cost Prediction 6.1.4.3. Scheduling 6.1.4.4. Intra-Communications of Functions 6.1.4.5. Data Caching 6.1.4.6. Security and Privacy 6.2. SERVERLESS MACHINE LEARNING 6.2.1. Introduction 6.2.2. Machine Learning and Data Management 6.2.3. Training Large Models in Serverless Computing 6.2.3.1. Data Transfer and Parallelism in Serverless Computing 6.2.3.2. Data Parallelism for Model Training in Serverless Computing 6.2.3.3. Optimizing Parallelism Structure in Serverless Training 6.2.4. Cost-Efficiency in Serverless Computing 6.3. CHAPTER SUMMARY CHAPTER 7: System-Level Design: From Standalone to Clusters 7.1. STALENESS-AWARE PIPELINING 7.1.1. Data Parallelism 7.1.2. Model Parallelism 7.1.2.1. Linear Models 7.1.2.2. Non-Linear Neural Networks 7.1.3. Hybrid Parallelism 7.1.4. Extension of Training Parallelism 7.1.5. Summary 7.2. INTRODUCTION TO FEDERATED LEARNING 7.3. TRAINING WITH NON-IID DATA 7.3.1. The Definition of Non-IID Data 7.3.2. Enabling Technologies for Non-IID Data 7.3.2.1. Data Sharing 7.3.2.2. Robust Aggregation Methods 7.3.2.3. Other Optimized Methods 7.4. LARGE-SCALE COLLABORATIVE LEARNING 7.4.1. Parameter Server 7.4.2. Decentralized P2P Scheme 7.4.3. Collective Communication-Based AllReduce 7.4.4. Data Flow-Based Graph 7.5. PERSONALIZED LEARNING 7.5.1. Data-Based Approaches 7.5.2. Model-Based Approaches 7.5.2.1. Single Model-Based Methods 7.5.2.2. Multiple Model-Based Methods 7.6. PRACTICE ON FL IMPLEMENTATION 7.6.1. Prerequisites 7.6.2. Data Distribution 7.6.3. Local Model Training 7.6.4. Global Model Aggregation 7.6.5. A Simple Example 7.7. CHAPTER SUMMARY CHAPTER 8: Application: Image-Based Visual Perception 8.1. IMAGE CLASSIFICATION 8.1.1. Traditional Image Classification Methods 8.1.2. Deep Learning-Based Image Classification Methods 8.1.3. Conclusion 8.2. IMAGE RESTORATION AND SUPER-RESOLUTION 8.2.1. Overview 8.2.2. A Unified Framework for Image Restoration and Super-Resolution 8.2.3. A Demo of Single Image Super-Resolution 8.2.3.1. Networks Architecture 8.2.3.2. Local Aware Attention 8.2.3.3. Global Aware Attention 8.2.3.4. LARD Block 8.3. SELF-ATTENTION AND VISION TRANSFORMERS 8.4. ENVIRONMENT PERCEPTION: IMAGE SEGMENTATION AND OBJECT DETECTION 8.4.1. Object Detection 8.4.1.1. Traditional Object Detection Model 8.4.1.2. Deep Learning-Based Object Detection Model 8.4.2. Image Segmentation 8.4.2.1. Semantic Segmentation 8.4.2.2. Instance Segmentation 8.4.2.3. Panoramic Segmentation 8.5. CHAPTER SUMMARY CHAPTER 9: Application: Video-Based Real-Time Processing 9.1. VIDEO RECOGNITION: EVOLVING FROM IMAGES 9.1.1. Challenges 9.1.2. Methodologies 9.1.2.1. Two-Stream Networks 9.1.2.2. 3D CNNs 9.2. MOTION TRACKING: LEARN FROM TIME-SPATIAL SEQUENCES 9.2.1. Deep Learning-Based Tracking 9.2.2. Optical Flow-Based Tracking 9.3. POSE ESTIMATION: KEY POINT EXTRACTION 9.3.1. 2D-Based Extraction 9.3.1.1. Single Person Estimation 9.3.1.2. Multiple Human Estimation 9.3.2. 3D-Based Extraction 9.4. PRACTICE: REAL-TIME MOBILE HUMAN POSE TRACKING 9.4.1. Prerequisites and Data Preparation 9.4.2. Hyper-Parameter Configuration and Model Training 9.4.3. Realistic Inference and Performance Evaluation 9.5. CHAPTER SUMMARY CHAPTER 10: Application: Privacy, Security, Robustness and Trustworthiness in Edge AI 10.1. PRIVACY PROTECTION METHODS 10.1.1. Homomorphic Encryption-Enabled Methods 10.1.2. Differential Privacy-Enabled Methods 10.1.3. Secure Multi-Party Computation 10.1.4. Lightweight Private Computation Techniques for Edge AI 10.1.4.1. Example 1: Lightweight and Secure Decision Tree Classification 10.1.4.2. Example 2: Lightweight and Secure SVM Classification 10.2. SECURITY AND ROBUSTNESS 10.2.1. Practical Issues 10.2.2. Backdoor Attacks 10.2.3. Backdoor Defences 10.3. TRUSTWORTHINESS 10.3.1. Blockchain and Swarm Learning 10.3.2. Trusted Execution Environment and Federated Learning 10.4. CHAPTER SUMMARY Bibliography Index "This book aims at the tiny machine learning (TinyML) software and hardware synergy for edge intelligence applications. This book presents on-device learning techniques covering model-level neural network design, algorithm-level training optimization, and hardware-level instruction acceleration. Analyzing the limitations of conventional in-cloud computing would reveal that on-device learning is a promising research direction to meet the requirements of edge intelligence applications. As to the cutting-edge research of TinyML, implementing a high-efficiency learning framework and enabling system-level acceleration is one of the most fundamental issues. This book presents a comprehensive discussion of the latest research progress and provides system-level insights on designing TinyML frameworks, including neural network design, training algorithm optimization and domain-specific hardware acceleration. It identifies the main challenges when deploying TinyML tasks in the real world and guides the researchers to deploy a reliable learning system. This book will be of interest to students and scholars in the field of edge intelligence, especially to those with sufficient professional Edge AI skills. It will also be an excellent guide for researchers to implement high-performance TinyML systems"-- Provided by publisher

دانلود کتاب Machine Learning on Commodity Tiny Devices : Theory and Practice