Big data analytics in chemoinformatics and bioinformatics : with applications to computer-aided drug design, cancer biology, emerging pathogens and computational toxicology

معرفی کتاب «Big data analytics in chemoinformatics and bioinformatics : with applications to computer-aided drug design, cancer biology, emerging pathogens and computational toxicology» نوشتهٔ Subhash C. Basak, Marjan Vračko, Marjan Vracko، منتشرشده توسط نشر Elsevier - Health Sciences Division در سال 2022. این کتاب در فرمت pdf، زبان انگلیسی ارائه شده است. «Big data analytics in chemoinformatics and bioinformatics : with applications to computer-aided drug design, cancer biology, emerging pathogens and computational toxicology» در دستهٔ بدون دسته‌بندی قرار دارد.

Big Data Analytics in Chemoinformatics and Bioinformatics: With Applications to Computer-Aided Drug Design, Cancer Biology, Emerging Pathogens and Computational Toxicology provides an up-to-date presentation of big data analytics methods and their applications in diverse fields. The proper management of big data for decision-making in scientific and social issues is of paramount importance. This book gives researchers the tools they need to solve big data problems in these fields. It begins with a section on general topics that all readers will find useful and continues with specific sections covering a range of interdisciplinary applications. Here, an international team of leading experts review their respective fields and present their latest research findings, with case studies used throughout to analyze and present key information. Brings together the current knowledge on the most important aspects of big data, including analysis using deep learning and fuzzy logic, transparency and data protection, disparate data analytics, and scalability of the big data domain Covers many applications of big data analysis in diverse fields such as chemistry, chemoinformatics, bioinformatics, computer-assisted drug/vaccine design, characterization of emerging pathogens, and environmental protection Highlights the considerable benefits offered by big data analytics to science, in biomedical fields and in industry Big Data Analytics in Chemoinformatics and Bioinformatics Preface List of contributors Copyright Contents 1 Chemoinformatics and bioinformatics by discrete mathematics and numbers: an adventure from small data to the realm of eme... 1.1 Introduction 1.2 Chemobioinformatics—a confluence of disciplines? 1.2.1 Physical property: colligative versus constitutive 1.2.2 Early biochemical observations on the relationship between chemical structure and bioactivity of molecules 1.2.3 Linear free energy relationship: the multiparameter Hansch approach to quantitative structure–activity relationship 1.2.4 Chemical graph theory and quantum chemistry as the source of chemodescriptors 1.2.4.1 Topological indices—graph theoretic definitions and calculation methods 1.2.4.2 What do the topological indices represent about molecular structure? 1.3 Bioifnormatics: quantitative inforamtics in the age of big biology 1.4 Major pillars of model building 1.5 Discussion 1.6 Conclusion Acknowledgment References 2 Robustness concerns in high-dimensional data analyses and potential solutions 2.1 Introduction 2.2 Sparse estimation in high-dimensional regression models 2.2.1 Starting of the era: the least absolute shrinkage and selection operator 2.2.2 Likelihood-based extensions of the LASSO 2.2.3 Search for a better penalty function 2.3 Robustness concerns for the penalized likelihood methods 2.4 Penalized M-estimation for robust high-dimensional analyses 2.5 Robust minimum divergence methods for high-dimensional regressions 2.5.1 The minimum penalized density power divergence estimator 2.5.2 Asymptotic properties of the MDPDE under high-dimensional GLMs 2.6 A real-life application: identifying important descriptors of amines for explaining their mutagenic activity 2.7 Concluding remarks Appendix: A list of useful R-packages for high-dimensional data analysis Acknowledgments References 3 Fairness, explainability, privacy, and robustness for trustworthy algorithmic decision-making 3.1 Introduction 3.2 Fairness in machine learning 3.2.1 Fairness metrics and definitions 3.2.2 Bias mitigation in machine learning models 3.2.2.1 Preprocessing 3.2.2.2 In-processing 3.2.2.3 Postprocessing 3.2.3 Implementation 3.3 Explainable artificial intelligence 3.3.1 Formal objectives of explainable artificial intelligence 3.3.1.1 Why explain? 3.3.1.2 Terminologies 3.3.2 Taxonomy of methods 3.3.2.1 In-model versus post-model explanations 3.3.2.2 Global and local explanations 3.3.2.3 Causal explainability 3.3.3 Do explanations serve their purpose? 3.3.3.1 From explanation to understanding 3.3.3.2 Implementations and tools 3.4 Notions of algorithmic privacy 3.4.1 Preliminaries of differential privacy 3.4.2 Privacy-preserving methodology 3.4.2.1 Local sensitivity and other mechanisms 3.4.2.2 Algorithms with differential privacy guarantees 3.4.3 Generalizations, variants, and applications 3.4.3.1 Pufferfish 3.4.3.2 Other variations 3.4.3.3 Implementations 3.5 Robustness 3.5.1 Adversarial attacks 3.5.2 Defense mechanisms 3.5.2.1 Adversarial (re)training 3.5.2.2 Use of regularization 3.5.2.3 Certified defenses 3.5.3 Implementations 3.6 Discussion References 4 How to integrate the “small and big” data into a complex adverse outcome pathway? 4.1 Introduction 4.2 State and review 4.3 Binding affinity to androgen nuclear receptor evaluated with respect to carcinogenic potency data 4.4 Conclusion and future directions References 5 Big data and deep learning: extracting and revising chemical knowledge from data 5.1 Introduction 5.2 Basic methods in neural networks and deep learning 5.2.1 Neural networks 5.2.2 Neural network learning 5.2.3 Deep learning and multilayer neural networks 5.2.3.1 Convolutional neural network 5.2.3.2 Recurrent neural network 5.2.3.3 Graph convolutional neural networks 5.2.4 Attention mechanism 5.3 Neural networks for quantitative structure–activity relationship: input, output, and parameters 5.3.1 Input 5.3.2 Chemical graphs and their representation 5.3.2.1 SMILES as input 5.3.2.2 Images of two-dimensional structures as input 5.3.2.3 Chemical graphs as input 5.3.3 Output 5.3.4 Performance parameters 5.4 Deep learning models for mutagenicity prediction 5.4.1 Structure–activity relationship and quantitative structure–activity relationship models for Ames test 5.4.2 Deep learning models for Ames test 5.4.2.1 Learning from SMILES 5.4.2.2 Learning from images 5.4.2.3 Integrating features from SMILES and images 5.4.2.4 Learning from chemical graphs 5.5 Interpreting deep neural network models 5.5.1 Extracting substructures 5.5.2 Comparison of substrings with SARpy SAs 5.5.3 Comparison of substructures with Toxtree 5.6 Discussion and conclusions 5.6.1 A future for deep learning models References 6 Retrosynthetic space modeled by big data descriptors 6.1 Introduction 6.2 Computer-assisted organic synthesis 6.2.1 Retrosynthetic space explored by molecular descriptors using big data sets 6.2.2 The exploration of chemical retrosynthetic space using retrosynthetic feasibility functions 6.3 Quantitative structure–activity relationship model 6.4 Dimensionality reduction using retrosynthetic analysis 6.5 Discussion References 7 Approaching history of chemistry through big data on chemical reactions and compounds 7.1 Introduction 7.2 Computational history of chemistry 7.2.1 Data and tools 7.3 The expanding chemical space, a case study for computational history of chemistry 7.4 Conclusions Acknowledgments References 8 Combinatorial and quantum techniques for large data sets: hypercubes and halocarbons 8.1 Introduction 8.2 Combinatorial techniques for isomer enumerations to generate large datasets 8.2.1 Combinatorial techniques for large data structures 8.2.2 Möbius inversion 8.2.3 Combinatorial results 8.3 Quantum chemical techniques for large data sets 8.3.1 Computational techniques for halocarbons 8.3.2 Results and discussions of quantum computations and toxicity of halocarbons 8.4 Hypercubes and large datasets 8.5 Conclusion References 9 Development of quantitative structure–activity relationship models based on electrophilicity index: a conceptual DFT-base... 9.1 Introduction 9.2 Theoretical background 9.3 Computational details 9.4 Methodology 9.5 Results and discussion 9.5.1 Tetrahymena pyriformis 9.5.2 Tryphanosoma brucei 9.6 Conclusion Acknowledgments Conflict of interest References 10 Pharmacophore-based virtual screening of large compound databases can aid “big data” problems in drug discovery 10.1 Introduction 10.2 Background of data analytics, machine learning, intelligent augmentation methods and applications in drug discovery 10.2.1 Applications of data analytics in drug discovery 10.2.2 Machine learning in drug discovery 10.2.3 Application of other computational approaches in drug discovery 10.2.4 Predictive drug discovery using molecular modeling 10.3 Pharmacophore modeling 10.3.1 Case studies 10.4 Concluding remarks References 11 A new robust classifier to detect hot-spots and null-spots in protein–protein interface: validation of binding pocket an... 11.1 Introduction 11.2 Training and testing of the classifier 11.2.1 Variable selection using recursive feature elimination 11.2.2 Random forest performed best using both published and combined datasets 11.3 Technical details to develop novel protein–protein interaction hotspot prediction program 11.3.1 Training data 11.3.2 Building and validating a novel classifier by evaluating state-of-the-art feature selection and machine learning alg... 11.4 A case study 11.4.1 Identification of a druggable protein–protein interaction site between mutant p53 and its stabilizing chaperone DNAJ... 11.4.2 Building the homology model of DNAJA1 and optimizing the mutp53 (R175H) structure 11.4.3 Protein–protein docking 11.4.4 Small molecules inhibitors identification through drug-like library screening against the DNAJA1- mutp53R175H intera... 11.5 Discussion Author contribution Acknowledgment Conflicts of interest References 12 Mining big data in drug discovery—triaging and decision trees 12.1 Introduction 12.2 Big data in drug discovery 12.3 Triaging 12.4 Decision trees 12.5 Recursive partitioning 12.6 PhyloGenetic-like trees 12.7 Multidomain classification 12.8 Fuzzy trees and clustering Acknowledgments References 13 Use of proteomics data and proteomics-based biodescriptors in the estimation of bioactivity/toxicity of chemicals and na... 13.1 Introduction 13.2 Proteomics technologies and their toxicological applications 13.2.1 Two-dimensional gel electrophoresis 13.2.1.1 Information theoretic approach for the quantification of proteomics maps 13.2.1.2 Chemometric approach for the calculation of spectrum-like mathematical proteomics descriptors 13.2.2 Mass spectrometry-based proteomics technology and their applications in mathematical nanotoxicoproteomics 13.3 Discussion Acknowledgment References 14 Mapping interaction between big spaces; active space from protein structure and available chemical space 14.1 Introduction 14.2 Background 14.2.1 Navigating protein fold space 14.2.2 From amino acid string to dynamic structural fold 14.2.3 Elements for classification of protein 14.2.4 Available methods for classifying proteins 14.3 Protein topology for exploring structure space 14.3.1 Modularity in protein structure space 14.3.2 Data-driven approach to extract topological module 14.4 Scaffolds curve the functional and catalytic sites 14.4.1 Signature of catalytic site in protein structures 14.4.2 Protein function-based selection of topological space 14.4.3 Protein dynamics and transient sites 14.4.4 Learning methods for the prediction of proteins and functional sites 14.5 Protein interactive sites and designing of inhibitor 14.5.1 Interaction space exploration for energetically favorable binding features identification 14.5.2 Protein dynamics guided binding features selection 14.5.3 Protein flexibility and exploration of ligand recognition site 14.5.4 Artificial intelligence to understand the interactions of protein and chemical 14.6 Intrinsically unstructured regions and protein function 14.7 Conclusions Acknowledgments References 15 Artificial intelligence, big data and machine learning approaches in genome-wide SNP-based prediction for precision medi... 15.1 Introduction 15.2 Role of artificial intelligence and machine learning in medicine 15.3 Genome-wide SNP prediction 15.4 Artificial intelligence, precision medicine and drug discovery 15.5 Applications of artificial intelligence in disease prediction and analysis oncology 15.6 Cardiology 15.7 Neurology 15.8 Conclusion Abbreviations References 16 Applications of alignment-free sequence descriptors in the characterization of sequences in the age of big data: a case ... 16.1 Introduction 16.2 Section 1—bioinformatics today: problems now 16.2.1 What is bioinformatics and genomics? 16.2.2 Annotations 16.2.3 Evolution of sequencing methods 16.2.4 Alignment-free sequence descriptors 16.2.5 Metagenomics 16.2.6 Software development: scenario and challenges 16.2.7 Data formats 16.2.8 Storage and exchange 16.3 Section 2—bioinformatics today and tomorrow: sustainable solutions 16.3.1 The need for big data 16.3.1.1 Volume 16.3.1.2 Variety 16.3.2 Software and development 16.3.2.1 Support for huge volume 16.3.2.2 Optimal efficiency in storage 16.3.2.3 Good data recovery solution 16.3.2.4 Horizontal scaling 16.3.2.5 Cost effective 16.3.2.6 Ease of access and understanding 16.3.2.6.1 Why “Hadoop”? 16.3.2.6.2 What is Hadoop? 16.3.2.7 Overview of Hadoop distributed file system 16.3.2.8 Overview of MapReduce 16.3.2.9 Some problems with MapReduce 16.3.2.10 Apache Pig 16.3.2.11 Data formats 16.3.2.12 May I have some structured query language please? 16.3.2.13 Storage and exchange 16.3.2.14 Visualization 16.4 Summary References 17 Scalable quantitative structure–activity relationship systems for predictive toxicology 17.1 Background 17.2 Scalability in quantitative structure–activity relationship modeling 17.2.1 Consequences of inability to scale 17.2.2 Expandability of the training dataset 17.2.3 Efficiency of data curation 17.2.4 Ability to handle stereochemistry 17.2.5 Ability to use proprietary training data 17.2.6 Ability to handle missing data 17.2.7 Ability to modify the descriptor set 17.2.8 Scaling expert rule-based systems 17.2.9 Scalability of adverse outcome pathway-based quantitative structure–activity relationship systems 17.2.10 Scalability of the supporting resources 17.2.11 Scalability of quantitative structure–activity relationships validation protocols 17.2.12 Scalability after deployment 17.2.13 Ability to use computer hardware resources effectively 17.3 Summary References 18 From big data to complex network: a navigation through the maze of drug–target interaction 18.1 Introduction 18.2 Databases 18.2.1 Chemical databases 18.2.1.1 DrugBank 18.2.1.2 PubChem 18.2.1.3 ChEMBL 18.2.1.4 ChemSpider 18.2.2 Databases for targets 18.2.2.1 UniProt 18.2.2.2 Protein Data Bank 18.2.2.3 String 18.2.2.4 BindingDB 18.2.3 Databases for traditional Chinese medicine 18.2.3.1 Traditional Chinese medicine Database@Taiwan 18.2.3.2 Traditional Chinese medicine systems pharmacology 18.2.3.3 Traditional Chinese medicine integrated database 18.3 Prediction, construction, and analysis of drug–target network 18.3.1 Algorithms to predict drug–target interaction network 18.3.1.1 Machine learning-based methods 18.3.1.2 Similarity-based methods 18.3.2 Tools for network construction 18.3.2.1 Cytoscape 18.3.2.2 Pajek 18.3.2.3 Gephi 18.3.2.4 NetworkX 18.3.3 Network topological analysis 18.3.3.1 Degree distribution 18.3.3.2 Path and distance 18.3.3.3 Module and motifs 18.4 Conclusion and perspectives Acknowledgments References 19 Dissecting big RNA-Seq cancer data using machine learning to find disease-associated genes and the causal mechanism 19.1 Introduction 19.2 Bird’s eye view of the analysis of cancer RNA-Seq data using machine learning 19.3 Materials and methods 19.3.1 Preprocessing of the data 19.3.2 Feature selection 19.3.3 Classification learning 19.3.4 Extraction of disease-associated genes 19.3.5 Validation 19.4 Hand-in-hand walk with RNA-Seq data 19.4.1 Dataset selection 19.4.2 Data preprocessing 19.4.3 Feature selection 19.4.4 Classification model 19.4.5 Identification of the genes involved in disease progression 19.4.6 Significance of the identified deeply associated genes 19.5 Conclusion References Index "Big Data Analytics in Chemoinformatics and Bioinformatics provides an up-to-date presentation of big data analytics methods and their applications in diverse fields. Various aspects of science, technology, and health care are affected by big data and associated prediction tools. The proper management of big data for decision-making in scientific and social issues is of paramount importance. This book gives researchers the tools they need to solve big data problems in these fields. The book begins with a section on general topics that all readers will find useful, and it continues with specific sections covering a range of interdisciplinary applications. An international team of leading experts review their respective fields and present their own latest research findings, and case studies which are utilized throughout to analyze and present key information."--Page 4 of cover

دانلود کتاب Big data analytics in chemoinformatics and bioinformatics : with applications to computer-aided drug design, cancer biology, emerging pathogens and computational toxicology