وبلاگ بلیان

Pathological Voice Analysis

معرفی کتاب «Pathological Voice Analysis» نوشتهٔ David Zhang; Kebin Wu, (Of Huawei Technologies)، منتشرشده توسط نشر Springer Singapore در سال 2020. این کتاب در فرمت pdf، زبان انگلیسی ارائه شده است. «Pathological Voice Analysis» در دستهٔ بدون دسته‌بندی قرار دارد.

While voice is widely used in speech recognition and speaker identification, its application in biomedical fields is much less common. This book systematically introduces the authors' research on voice analysis for biomedical applications, particularly pathological voice analysis. Firstly, it reviews the field to highlight the biomedical value of voice. It then offers a comprehensive overview of the workflow and aspects of pathological voice analysis, including voice acquisition systems, voice pitch estimation methods, glottal closure instant detection, feature extraction and learning, and the multi-audio fusion approaches. Lastly, it discusses the experimental results that have shown the superiority of these techniques. This book is useful to researchers, professionals and postgraduate students working in fields such as speech signal processing, pattern recognition, and biomedical engineering. It is also a valuable resource for those involved in interdisciplinary research. -- Provided by publisher Preface 5 Contents 7 Chapter 1: Introduction 11 1.1 Pathological Voice Analysis 11 1.2 Computerized Voice Analysis 14 1.2.1 Introduction 14 1.2.2 Biomedical Value of Voice 15 1.2.2.1 Mechanism of Voice Production 16 1.2.2.2 Voice and Diseases 17 Voice and Diseases in Nerve System 17 Voice and Diseases in Respiratory System 18 Voice and Diseases in Vocal Folds and Vocal Tract 19 Necessity of Computerized Voice Analysis 20 1.2.3 Present Situation of Computerized Voice Analysis 21 1.2.3.1 Voice Database 21 1.2.3.2 Feature Extraction 22 1.2.3.3 Classification and Regression 25 1.2.3.4 Deep Learning Based Computerized Voice Analysis 26 1.2.3.5 State-of-the-Art Result on the Publicly Available Dataset 27 1.2.4 Open Challenges in Computerized Voice Analysis 27 1.2.4.1 Data Level Challenges 28 1.2.4.2 Algorithm Level Challenges 29 1.2.5 Future Directions in Computerized Voice Analysis 30 1.2.6 Summary 32 References 32 Chapter 2: Pathological Voice Acquisition 39 2.1 Introduction 40 2.2 Database Description 42 2.3 Experimental Results 45 2.3.1 Metric 1: Information Entropy (Signal Level) 45 2.3.2 Metric 2: Reconstruction Error (Signal Level) 46 2.3.3 Metric 3: Feature Correlation (Feature Level) 47 2.3.4 Metric 4: Classification Accuracy (System Level) 48 2.3.5 Metric 5: Computational Cost (Hardware Level) 49 2.3.6 Metric 6: Storage Cost (Hardware Level) 50 2.3.7 Discussion and Guideline on the Sampling Rate Selection 51 2.4 Summary 52 References 52 Chapter 3: Pitch Estimation 56 3.1 Introduction 56 3.2 Related Works 57 3.3 Harmonics Enhancement for Pitch Estimation 61 3.3.1 Motivation for Harmonics Enhancement 61 3.3.2 Theoretical Analysis 63 3.3.3 Implementation 65 3.3.3.1 Enframe and FFT 65 3.3.3.2 Self-Circular Convolution 66 3.3.3.3 Baseline Removal 68 3.3.3.4 Spectrum Superimposition 68 3.3.3.5 Usage in Pitch Estimation 68 3.3.4 In-Depth Analysis of the Proposed Algorithm 69 3.3.4.1 Harmonics Structure and Pitch Estimation 69 3.3.4.2 Feasibility Analysis 71 3.3.4.3 Speeding Up Strategy and Time Complexity Analysis 72 3.4 Experimental Result 74 3.4.1 Basic Experimental Setting 74 3.4.2 Performance for the Noisy Speeches 75 3.4.2.1 HPS 75 3.4.2.2 PEFAC 77 3.4.2.3 SHRP 77 3.4.2.4 BaNa 78 3.4.2.5 Computational Complexity Comparison 79 3.4.3 Extension to Music 79 3.4.4 Discussion 80 3.5 Summary 80 References 81 Chapter 4: Glottal Closure Instants Detection 84 4.1 Introduction 85 4.2 Related Works 86 4.2.1 Classical GCI Detection Methods 88 4.2.2 Limitations 90 4.3 GCI Detection Using TKEO 91 4.3.1 TKEO and Its Relationship with GCI 91 4.3.2 Absolute TKEO with Scale Parameter k 92 4.3.3 Multiresolution Combination 93 4.3.4 GMAT: GCI Detection Based on Multiresolution Absolute TKEO 95 4.4 Experimental Results 97 4.4.1 Basic Experimental Settings 97 4.4.2 Performance Comparison 99 4.4.2.1 GCI Detection for Clean Speech 99 4.4.2.2 GCI Detection for Noisy Speech 99 4.4.2.3 Analysis of GCI Refinement Scheme 105 4.4.2.4 Comparison of Running Time 105 4.4.3 GCIs Detection in Pathological Speech Identification 106 4.5 Discussion 107 4.5.1 Parameter Sensitivity 107 4.5.1.1 Analysis of the Scale Number M 107 4.5.1.2 Analysis of Window Length Tl 108 4.5.2 GMAT and Low Frequency Noise 110 4.5.3 Relation Between GMAT and MSM 110 4.5.4 Different Pooling Methods in GMAT 111 4.6 Summary 112 References 112 Chapter 5: Feature Learning 116 5.1 Introduction 116 5.2 Related Works 117 5.3 Proposed Feature Learning Method 119 5.3.1 Preprocessing 120 5.3.2 Mel-Spectrogram 120 5.3.3 Dictionary Learning Based on Spherical K-Means 121 5.3.4 Encoding and Pooling 121 5.4 Experimental Results 122 5.4.1 Dataset and Experimental Setup 122 5.4.2 PD Detection with Learned and Hand-Crafted Features 123 5.4.3 Ablation Study 124 5.4.3.1 Clustering Number in Spherical K-Means 124 5.4.3.2 Pooling Methods 126 5.5 Summary 127 References 128 Chapter 6: Joint Learning for Voice Based Disease Detection 131 6.1 Introduction 131 6.2 Related Works 133 6.2.1 Notation 133 6.2.2 Ridge Regression 133 6.2.3 Low-Rank Ridge Regression 134 6.2.4 ε-Dragging Technique 135 6.3 Proposed Method 136 6.3.1 Joint Learning with Label Relaxed Low-Rank Ridge Regression (JOLL4R) 136 6.3.2 Algorithm to Solve JOLL4R 137 6.3.3 Classification 140 6.4 Experimental Results 142 6.4.1 Dataset and Experimental Setup 142 6.4.2 The Detection of Patients with Cordectomy 143 6.4.3 The Detection of Patients with Frontolateral Resection 143 6.4.4 Ablation Study 143 6.4.4.1 Parameters Sensitivity Analysis 144 6.4.4.2 The Effect of the ε-Dragging Technique 147 6.4.4.3 The Effect of ADASYN 147 6.5 Discussions 147 6.5.1 Visualization of JOLL4R 147 6.5.2 Classifier in JOLL4R 148 6.5.3 Extension to Fusion of More Audios 149 6.6 Summary 150 References 150 Chapter 7: Robust Multi-View Discriminative Learning for Voice Based Disease Detection 154 7.1 Introduction 154 7.2 Proposed Method 156 7.2.1 Notation 156 7.2.2 The Objective Function 156 7.2.3 Optimization of ROME-DLR 159 7.2.4 The Classification Rule of ROME-DLR 162 7.3 Experimental Results 163 7.3.1 Dataset and Experimental Setup 163 7.3.2 The Detection of RLNP 164 7.3.3 The Detection of SD 166 7.3.4 Ablation Study 166 7.3.4.1 Regularization Parameters α and β 166 7.3.4.2 Parameter τ 167 7.3.4.3 Parameter λ 168 7.3.4.4 The Influence of ADASYN 168 7.4 Discussions 169 7.5 Summary 170 References 171 Chapter 8: Book Review and Future Work 174 8.1 Book Recapitulation 174 8.2 Future Work 176 References 177 Index 178 Front Matter ....Pages i-x Introduction (David Zhang, Kebin Wu)....Pages 1-28 Pathological Voice Acquisition (David Zhang, Kebin Wu)....Pages 29-45 Pitch Estimation (David Zhang, Kebin Wu)....Pages 47-74 Glottal Closure Instants Detection (David Zhang, Kebin Wu)....Pages 75-106 Feature Learning (David Zhang, Kebin Wu)....Pages 107-121 Joint Learning for Voice Based Disease Detection (David Zhang, Kebin Wu)....Pages 123-145 Robust Multi-View Discriminative Learning for Voice Based Disease Detection (David Zhang, Kebin Wu)....Pages 147-166 Book Review and Future Work (David Zhang, Kebin Wu)....Pages 167-170 Back Matter ....Pages 171-174
دانلود کتاب Pathological Voice Analysis