Accelerators for Convolutional Neural Networks

معرفی کتاب «Accelerators for Convolutional Neural Networks» نوشتهٔ Arslan Munir; Joonho Kong; Mahmood Azhar Qureshi، منتشرشده توسط نشر John Wiley & Sons در سال 2023. این کتاب در فرمت pdf، زبان انگلیسی ارائه شده است. «Accelerators for Convolutional Neural Networks» در دستهٔ بدون دسته‌بندی قرار دارد.

Accelerators for Convolutional Neural Networks Comprehensive and thorough resource exploring different types of convolutional neural networks and complementary accelerators Accelerators for Convolutional Neural Networks provides basic deep learning knowledge and instructive content to build up convolutional neural network (CNN) accelerators for the Internet of things (IoT) and edge computing practitioners, elucidating compressive coding for CNNs, presenting a two-step lossless input feature maps compression method, discussing arithmetic coding -based lossless weights compression method and the design of an associated decoding method, describing contemporary sparse CNNs that consider sparsity in both weights and activation maps, and discussing hardware/software co-design and co-scheduling techniques that can lead to better optimization and utilization of the available hardware resources for CNN acceleration. The first part of the book provides an overview of CNNs along with the composition and parameters of different contemporary CNN models. Later chapters focus on compressive coding for CNNs and the design of dense CNN accelerators. The book also provides directions for future research and development for CNN accelerators. Other sample topics covered in Accelerators for Convolutional Neural Networks include: How to apply arithmetic coding and decoding with range scaling for lossless weight compression for 5-bit CNN weights to deploy CNNs in extremely resource-constrained systems State-of-the-art research surrounding dense CNN accelerators, which are mostly based on systolic arrays or parallel multiply-accumulate (MAC) arrays iMAC dense CNN accelerator, which combines image-to-column (im2col) and general matrix multiplication (GEMM) hardware acceleration Multi-threaded, low-cost, log-based processing element (PE) core, instances of which are stacked in a spatial grid to engender NeuroMAX dense accelerator Sparse-PE, a multi-threaded and flexible CNN PE core that exploits sparsity in both weights and activation maps, instances of which can be stacked in a spatial grid for engendering sparse CNN accelerators For researchers in AI, computer vision, computer architecture, and embedded systems, along with graduate and senior undergraduate students in related programs of study, Accelerators for Convolutional Neural Networks is an essential resource to understanding the many facets of the subject and relevant applications. Cover Title Page Copyright Contents About the Authors Preface Part I Overview Chapter 1 Introduction 1.1 History and Applications 1.2 Pitfalls of High‐Accuracy DNNs/CNNs 1.2.1 Compute and Energy Bottleneck 1.2.2 Sparsity Considerations 1.3 Chapter Summary Chapter 2 Overview of Convolutional Neural Networks 2.1 Deep Neural Network Architecture 2.2 Convolutional Neural Network Architecture 2.2.1 Data Preparation 2.2.2 Building Blocks of CNNs 2.2.2.1 Convolutional Layers 2.2.2.2 Pooling Layers 2.2.2.3 Fully Connected Layers 2.2.3 Parameters of CNNs 2.2.4 Hyperparameters of CNNs 2.2.4.1 Hyperparameters Related to Network Structure 2.2.4.2 Hyperparameters Related to Training 2.2.4.3 Hyperparameter Tuning 2.3 Popular CNN Models 2.3.1 AlexNet 2.3.2 VGGNet 2.3.3 GoogleNet 2.3.4 SqueezeNet 2.3.5 Binary Neural Networks 2.3.6 EfficientNet 2.4 Popular CNN Datasets 2.4.1 MNIST Dataset 2.4.2 CIFAR 2.4.3 ImageNet 2.5 CNN Processing Hardware 2.5.1 Temporal Architectures 2.5.2 Spatial Architectures 2.5.3 Near‐Memory Processing 2.6 Chapter Summary Part II Compressive Coding for CNNs Chapter 3 Contemporary Advances in Compressive Coding for CNNs 3.1 Background of Compressive Coding 3.2 Compressive Coding for CNNs 3.3 Lossy Compression for CNNs 3.4 Lossless Compression for CNNs 3.5 Recent Advancements in Compressive Coding for CNNs 3.6 Chapter Summary Chapter 4 Lossless Input Feature Map Compression 4.1 Two‐Step Input Feature Map Compression Technique 4.2 Evaluation 4.3 Chapter Summary Chapter 5 Arithmetic Coding and Decoding for 5‐Bit CNN Weights 5.1 Architecture and Design Overview 5.2 Algorithm Overview 5.2.1 Weight Encoding Algorithm 5.3 Weight Decoding Algorithm 5.4 Encoding and Decoding Examples 5.4.1 Decoding Hardware 5.5 Evaluation Methodology 5.6 Evaluation Results 5.6.1 Compression Ratio and Memory Energy Consumption 5.6.2 Latency Overhead 5.6.3 Latency vs. Resource Usage Trade‐Off 5.6.4 System‐Level Energy Estimation 5.7 Chapter Summary Part III Dense CNN Accelerators Chapter 6 Contemporary Dense CNN Accelerators 6.1 Background on Dense CNN Accelerators 6.2 Representation of the CNN Weights and Feature Maps in Dense Format 6.3 Popular Architectures for Dense CNN Accelerators 6.4 Recent Advancements in Dense CNN Accelerators 6.5 Chapter Summary Chapter 7 iMAC: Image‐to‐Column and General Matrix Multiplication‐Based Dense CNN Accelerator 7.1 Background and Motivation 7.2 Architecture 7.3 Implementation 7.4 Chapter Summary Chapter 8 NeuroMAX: A Dense CNN Accelerator 8.1 Related Work 8.2 Log Mapping 8.3 Hardware Architecture 8.3.1 Top‐Level 8.3.2 PE Matrix 8.4 Data Flow and Processing 8.4.1 3×3 Convolution 8.4.2 1×1 Convolution 8.4.3 Higher‐Order Convolutions 8.5 Implementation and Results 8.6 Chapter Summary Part IV Sparse CNN Accelerators Chapter 9 Contemporary Sparse CNN Accelerators 9.1 Background of Sparsity in CNN Models 9.2 Background of Sparse CNN Accelerators 9.3 Recent Advancements in Sparse CNN Accelerators 9.4 Chapter Summary Chapter 10 CNN Accelerator for In Situ Decompression and Convolution of Sparse Input Feature Maps 10.1 Overview 10.2 Hardware Design Overview 10.3 Design Optimization Techniques Utilized in the Hardware Accelerator 10.4 FPGA Implementation 10.5 Evaluation Results 10.5.1 Performance and Energy 10.5.2 Comparison with State‐of‐the‐Art Hardware Accelerator Implementations 10.6 Chapter Summary Chapter 11 Sparse‐PE: A Sparse CNN Accelerator 11.1 Related Work 11.2 Sparse‐PE 11.2.1 Sparse Binary Mask 11.2.2 Selection 11.2.3 Computation 11.2.4 Accumulation 11.2.5 Output Encoding 11.3 Implementation and Results 11.3.1 Cycle‐Accurate Simulator 11.3.1.1 Performance with Varying Sparsity 11.3.1.2 Comparison Against Past Approaches 11.3.2 RTL Implementation 11.4 Chapter Summary Chapter 12 Phantom: A High‐Performance Computational Core for Sparse CNNs 12.1 Related Work 12.2 Phantom 12.2.1 Sparse Mask Representation 12.2.2 Core Architecture 12.2.3 Lookahead Masking 12.2.4 Top‐Down Selector 12.2.4.1 In‐Order Selection 12.2.4.2 Out‐of‐Order Selection 12.2.5 Thread Mapper 12.2.6 Compute Engine 12.2.7 Output Buffer 12.2.8 Output Encoding 12.3 Phantom‐2D 12.3.1 R×C Compute Matrix 12.3.2 Load Balancing 12.3.3 Regular/Depthwise Convolution 12.3.3.1 Intercore Balancing 12.3.4 Pointwise Convolution 12.3.5 FC Layers 12.3.6 Intracore Balancing 12.4 Experiments and Results 12.4.1 Evaluation Methodology 12.4.1.1 Cycle‐Accurate Simulator 12.4.1.2 Simulated Models 12.4.2 Results 12.4.2.1 TDS Variants Comparison 12.4.2.2 Impact of Load Balancing 12.4.2.3 Sensitivity to Sparsity and Lf 12.4.2.4 Comparison Against Past Approaches 12.4.2.5 RTL Synthesis Results 12.5 Chapter Summary Part V HW/SW Co-Design and Co-Scheduling for CNN Acceleration Chapter 13 State‐of‐the‐Art in HW/SW Co‐Design and Co‐Scheduling for CNN Acceleration 13.1 HW/SW Co‐Design 13.1.1 Case Study: Cognitive IoT 13.1.2 Recent Advancements in HW/SW Co‐Design 13.2 HW/SW Co‐Scheduling 13.2.1 Recent Advancements in HW/SW Co‐Scheduling 13.3 Chapter Summary Chapter 14 Hardware/Software Co‐Design for CNN Acceleration 14.1 Background of iMAC Accelerator 14.2 Software Partition for iMAC Accelerator 14.2.1 Channel Partition and Input/Weight Allocation to Hardware Accelerator 14.2.2 Exploiting Parallelism Within Convolution Layer Operations 14.3 Experimental Evaluations 14.4 Chapter Summary Chapter 15 CPU‐Accelerator Co‐Scheduling for CNN Acceleration 15.1 Background and Preliminaries 15.1.1 Convolutional Neural Networks 15.1.2 Baseline System Architecture 15.2 CNN Acceleration with CPU‐Accelerator Co‐Scheduling 15.2.1 Overview 15.2.2 Linear Regression‐Based Latency Model 15.2.2.1 Accelerator Latency Model 15.2.2.2 CPU Latency Model 15.2.3 Channel Distribution 15.2.4 Prototype Implementation 15.3 Experimental Results 15.3.1 Latency Model Accuracy 15.3.2 Performance 15.3.3 Energy 15.3.4 Case Study: Tiny Darknet CNN Inferences 15.4 Chapter Summary Chapter 16 Conclusions References Index EULA

دانلود کتاب Accelerators for Convolutional Neural Networks