Understanding bioinformatics

معرفی کتاب «Understanding bioinformatics» نوشتهٔ Baum, Jeremy O.;Zvelebil, Marketa، منتشرشده توسط نشر Garland Science/Taylor & Francis Group در سال 2008. این کتاب در فرمت pdf، زبان انگلیسی ارائه شده است. «Understanding bioinformatics» در دستهٔ بدون دسته‌بندی قرار دارد.

Suitable for advanced undergraduates and postgraduates, __Understanding Bioinformatics__ provides a definitive guide to this vibrant and evolving discipline. The book takes a conceptual approach. It guides the reader from first principles through to an understanding of the computational techniques and the key algorithms. __Understanding Bioinformatics__ is an invaluable companion for students from their first encounter with the subject through to more advanced studies. The book is divided into seven parts, with the opening part introducing the basics of nucleic acids, proteins and databases. Subsequent parts are divided into 'Applications' and 'Theory' Chapters, allowing readers to focus their attention effectively. In each section, the Applications Chapter provides a fast and straightforward route to understanding the main concepts and 'getting started'. Each of these is then followed by Theory Chapters which give greater detail and present the underlying mathematics. In Part 2, Sequence Alignments, the Applications Chapter shows the reader how to get started on producing and analyzing sequence alignments, and using sequences for database searching, while the next two chapters look closely at the more advanced techniques and the mathematical algorithms involved. Part 3 covers evolutionary processes and shows how bioinformatics can be used to help build phylogenetic trees. Part 4 looks at the characteristics of whole genomes. In Parts 5 and 6 the focus turns to secondary and tertiary structure – predicting structural conformation and analysing structure-function relationships. The last part surveys methods of analyzing data from a set of genes or proteins of an organism and is rounded off with an overview of systems biology. The writing style of __Understanding Bioinformatics__ is notable for its clarity, while the extensive, full-color artwork has been designed to present the key concepts with simplicity and consistency. Each chapter uses mind-maps and flow diagrams to give an overview of the conceptual links within each topic. COVER 1 PREFACE 6 A NOTE TO THE READER 8 LIST OF REVIEWERS 13 CONTENTS IN BRIEF 14 CONTENTS 15 PART 1 BACKGROUND BASICS 26 CHAPTER 1 THE NUCLEIC ACID WORLD 28 1.1 The Structure of DNA and RNA 30 DNA is a linear polymer of only four different bases 30 Two complementary DNA strands interact by base-pairing to form a double helix 32 RNA molecules are mostly single stranded but can also have base-pair structures 34 1.2 DNA, RNA, and Protein: The Central Dogma 35 DNA is the information store, but RNA is the messenger 36 Messenger RNA is translated into protein according to the genetic code 37 Translation involves transfer RNAs and RNA-containing ribosomes 38 1.3 Gene Structure and Control 39 RNA polymerase binds to specific sequences that position it and identify where to begin transcription 40 The signals initiating transcription in eukaryotes are generally more complex than those in bacteria 42 Eukaryotic mRNA transcripts undergo several modifications prior to their use in translation 43 The control of translation 44 1.4 The Tree of Life and Evolution 45 A brief survey of the basic characteristics of the major forms of life 46 Nucleic acid sequences can change as a result of mutation 47 Summary 48 Further Reading 49 CHAPTER 2 PROTEIN STRUCTURE 50 2.1 Primary and Secondary Structure 51 Protein structure can be considered on several different levels 51 Amino acids are the building blocks of proteins 52 The differing chemical and physical properties of amino acids are due to their side chains 53 Amino acids are covalently linked together in the protein chain by peptide bonds 54 Secondary structure of proteins is made up of α-helices and β-strands 58 Several different types of β-sheet are found in protein structures 60 Turns, hairpins, and loops connect helices and strands 61 2.2 Implication for Bioinformatics 62 Certain amino acids prefer a particular structural unit 62 Evolution has aided sequence analysis 63 Visualization and computer manipulation of protein structures 63 2.3 Proteins Fold to Form Compact Structures 65 The tertiary structure of a protein is defined by the path of the polypeptide chain 66 The stable folded state of a protein represents a state of low energy 66 Many proteins are formed of multiple subunits 67 Summary 68 Further Reading 69 CHAPTER 3 DEALING WITH DATABASES 70 3.1 The Structure of Databases 71 Flat-file databases store data as text files 73 Relational databases are widely used for storing biological information 74 XML has the flexibility to define bespoke data classifications 75 Many other database structures are used for biological data 76 Databases can be accessed locally or online and often link to each other 77 3.2 Types of Database 77 There’s more to databases than just data 78 Primary and derived data 78 How we define and connect things is important: Ontologies 79 3.3 Looking for Databases 80 Sequence databases 80 Microarray databases 83 Protein interaction databases 83 Structural databases 84 3.4 Data Quality 86 Nonredundancy is especially important for some applications of sequence databases 87 Automated methods can be used to check for data consistency 88 Initial analysis and annotation is usually automated 89 Human intervention is often required to produce the highest quality annotation 90 The importance of updating databases and entry identifier and version numbers 90 Summary 91 Further Reading 92 PART 2 SEQUENCE ALIGNMENTS 94 CHAPTER 4 PRODUCING AND ANALYZING SEQUENCE ALIGNMENTS 96 4.1 Principles of Sequence Alignment 97 Alignment is the task of locating equivalent regions of two or more sequences to maximize their similarity 98 Alignment can reveal homology between sequences 99 It is easier to detect homology when comparing protein sequences than when comparing nucleic acid sequences 100 4.2 Scoring Alignments 101 The quality of an alignment is measured by giving it a quantitative score 101 The simplest way of quantifying similarity between two sequences is percentage identity 101 The dot-plot gives a visual assessment of similarity based on identity 102 Genuine matches do not have to be identical 104 There is a minimum percentage identity that can be accepted as significant 106 There are many different ways of scoring an alignment 106 4.3 Substitution Matrices 106 Substitution matrices are used to assign individual scores to aligned sequence positions 106 The PAM substitution matrices use substitution frequencies derived from sets of closely related protein sequences 107 The BLOSUM substitution matrices use mutation data from highly conserved local regions of sequence 109 The choice of substitution matrix depends on the problem to be solved 109 4.4 Inserting Gaps 110 Gaps inserted in a sequence to maximize similarity with another require a scoring penalty 110 Dynamic programming algorithms can determine the optimal introduction of gaps 111 4.5 Types of Alignment 112 Different kinds of alignments are useful in different circumstances 112 Multiple sequence alignments enable the simultaneous comparison of a set of similar sequences 115 Multiple alignments can be constructed by several different techniques 115 Multiple alignments can improve the accuracy of alignment for sequences of low similarity 116 ClustalW can make global multiple alignments of both DNA and protein sequences 117 Multiple alignments can be made by combining a series of local alignments 117 Alignment can be improved by incorporating additional information 118 4.6 Searching Databases 118 Fast yet accurate search algorithms have been developed 119 FASTA is a fast database-search method based on matching short identical segments 120 BLAST is based on finding very similar short segments 120 Different versions of BLAST and FASTA are used for different problems 120 PSI-BLAST enables profile-based database searches 121 SSEARCH is a rigorous alignment method 122 4.7 Searching with Nucleic Acid or Protein Sequences 122 DNA or RNA sequences can be used either directly or after translation 122 The quality of a database match has to be tested to ensure that it could not have arisen by chance 122 Choosing an appropriate E-value threshold helps to limit a database search 123 Low-complexity regions can complicate homology searches 125 Different databases can be used to solve particular problems 127 4.8 Protein Sequence Motifs or Patterns 128 Creation of pattern databases requires expert knowledge 129 The BLOCKS database contains automatically compiled short blocks of conserved multiply aligned protein sequences 130 4.9 Searching Using Motifs and Patterns 132 The PROSITE database can be searched for protein motifs and patterns 132 The pattern-based program PHI-BLAST searches for both homology and matching motifs 133 Patterns can be generated from multiple sequences using PRATT 133 The PRINTS database consists of fingerprints representing sets of conserved motifs that describe a protein family 134 The Pfam database defines profiles of protein families 134 4.10 Patterns and Protein Function 134 Searches can be made for particular functional sites in proteins 134 Sequence comparison is not the only way of analyzing protein sequences 135 Summary 136 Further Reading 137 CHAPTER 5 PAIRWISE SEQUENCE ALIGNMENT AND DATABASE SEARCHING 140 5.1 Substitution Matrices and Scoring 142 Alignment scores attempt to measure the likelihood of a common evolutionary ancestor 142 The PAM (MDM) substitution scoring matrices were designed to trace the evolutionary origins of proteins 144 The BLOSUM matrices were designed to find conserved regions of proteins 147 Scoring matrices for nucleotide sequence alignment can be derived in similar ways 150 The substitution scoring matrix used must be appropriate to the specific alignment problem 151 Gaps are scored in a much more heuristic way than substitutions 151 5.2 Dynamic Programming Algorithms 152 Optimal global alignments are produced using efficient variations of the Needleman−Wunsch algorithm 154 Local and suboptimal alignments can be produced by making small modifications to the dynamic programming algorithm 160 Time can be saved with a loss of rigor by not calculating the whole matrix 164 5.3 Indexing Techniques and Algorithmic Approximations 166 Suffix trees locate the positions of repeats and unique sequences 166 Hashing is an indexing technique that lists the starting positions of all k-tuples 168 The FASTA algorithm uses hashing and chaining for fast database searching 169 The BLAST algorithm makes use of finite-state automata 172 Comparing a nucleotide sequence directly with a protein sequence requires special modifications to the BLAST and FASTA algorithms 175 5.4 Alignment Score Significance 178 The statistics of gapped local alignments can be approximated by the same theory 181 5.5 Aligning Complete Genome Sequences 181 Indexing and scanning whole genome sequences efficiently is crucial for the sequence alignment of higher organisms 182 The complex evolutionary relationships between the genomes of even closely related organisms require novel alignment algorithms 184 Summary 184 Further Reading 186 CHAPTER 6 PATTERNS, PROFILES, AND MULTIPLE ALIGNMENTS 190 6.1 Profiles and Sequence Logos 192 Position-specific scoring matrices are an extension of substitution scoring matrices 193 Methods for overcoming a lack of data in deriving the values for a PSSM 196 PSI-BLAST is a sequence database searching program 201 Representing a profile as a logo 202 6.2 Profile Hidden Markov Models 204 The basic structure of HMMs used in sequence alignment to profiles 205 Estimating HMM parameters using aligned sequences 210 Scoring a sequence against a profile HMM: The most probable path and the sum over all paths 212 Estimating HMM parameters using unaligned sequences 216 6.3 Aligning Profiles 218 Comparing two PSSMs by alignment 218 Aligning profile HMMs 220 6.4 Multiple Sequence Alignments by Gradual Sequence Addition 221 The order in which sequences are added is chosen based on the estimated likelihood of incorporating errors in the alignment 223 Many different scoring schemes have been used in constructing multiple alignments 225 The multiple alignment is built using the guide tree and profile methods and may be further refined 229 6.5 Other Ways of Obtaining Multiple Alignments 232 The multiple sequence alignment program DIALIGN aligns ungapped blocks 232 The SAGA method of multiple alignment uses a genetic algorithm 234 6.6 Sequence Pattern Discovery 236 Discovering patterns in a multiple alignment: eMOTIF andAACC 238 Probabilistic searching for common patterns in sequences: Gibbs and MEME 240 Searching for more general sequence patterns 242 Summary 243 Further Reading 244 PART 3 EVOLUTIONARY PROCESSES 246 CHAPTER 7 RECOVERING EVOLUTIONARY HISTORY 248 7.1 The Structure and Interpretation of Phylogenetic Trees 250 Phylogenetic trees reconstruct evolutionary relationships 250 Tree topology can be described in several ways 255 Consensus and condensed trees report the results of comparing tree topologies 257 7.2 Molecular Evolution and its Consequences 260 Most related sequences have many positions that have mutated several times 261 The rate of accepted mutation is usually not the same for all types of base substitution 261 Different codon positions have different mutation rates 263 Only orthologous genes should be used to construct species phylogenetic trees 264 Major changes affecting large regions of the genome are surprisingly common 272 7.3 Phylogenetic Tree Reconstruction 273 Small ribosomal subunit rRNA sequences are well suited to reconstructing the evolution of species 274 The choice of the method for tree reconstruction depends to some extent on the size and quality of the dataset 274 A model of evolution must be chosen to use with the method 276 All phylogenetic analyses must start with an accurate multiple alignment 280 Phylogenetic analyses of a small dataset of 16S RNA sequence data 280 Building a gene tree for a family of enzymes can help to identify how enzymatic functions evolved 284 Summary 289 Further Reading 290 CHAPTER 8 BUILDING PHYLOGENETIC TREES 292 8.1 Evolutionary Models and the Calculation of Evolutionary Distance 293 A simple but inaccurate measure of evolutionary distance is the p-distance 293 The Poisson distance correction takes account of multiple mutations at the same site 295 The Gamma distance correction takes account of mutation rate variation at different sequence positions 295 The Jukes−Cantor model reproduces some basic features of the evolution of nucleotide sequences 296 More complex models distinguish between the relative frequencies of different types of mutation 297 There is a nucleotide bias in DNA sequences 300 Models of protein-sequence evolution are closely related to the substitution matrices used for sequence alignment 301 8.2 Generating Single Phylogenetic Trees 301 Clustering methods produce a phylogenetic tree based on evolutionary distances 301 The UPGMA method assumes a constant molecular clock and produces an ultrametric tree 303 The Fitch–Margoliash method produces an unrooted additive tree 304 The neighbor-joining method is related to the concept of minimum evolution 307 Stepwise addition and star-decomposition methods are usually used to generate starting trees for further exploration, not the final tree 310 8.3 Generating Multiple Tree Topologies 311 The branch-and-bound method greatly improves the efficiency of exploring tree topology 313 Optimization of tree topology can be achieved by making a series of small changes to an existing tree 313 Finding the root gives a phylogenetic tree a direction in time 316 8.4 Evaluating Tree Topologies 318 Functions based on evolutionary distances can be used to evaluate trees 318 Unweighted parsimony methods look for the trees with the smallest number of mutations 322 Mutations can be weighted in different ways in the parsimony method 325 Trees can be evaluated using the maximum likelihood method 327 The quartet-puzzling method also involves maximum likelihood in the standard implementation 330 Bayesian methods can also be used to reconstruct phylogenetic trees 331 8.5 Assessing the Reliability of Tree Features and Comparing Trees 332 The long-branch attraction problem can arise even with perfect data and methodology 333 Tree topology can be tested by examining the interior branches 334 Tests have been proposed for comparing two or more alternative trees 335 Summary 336 Further Reading 337 PART 4 GENOME CHARACTERISTICS 340 CHAPTER 9 REVEALING GENOME FEATURES 342 9.1 Preliminary Examination of Genome Sequence 343 Whole genome sequences can be split up to simplify gene searches 344 Structural RNA genes and repeat sequences can be excluded from further analysis 344 Homology can be used to identify genes in both prokaryotic and eukaryotic genomes 347 9.2 Gene Prediction in Prokaryotic Genomes 347 9.3 Gene Prediction in Eukaryotic Genomes 348 Programs for predicting exons and introns use a variety of approaches 349 Gene predictions must preserve the correct reading frame 350 Some programs search for exons using only the query sequence and a model for exons 352 Some programs search for genes using only the query sequence and a gene model 357 Genes can be predicted using a gene model and sequence similarity 359 Genomes of related organisms can be used to improve gene prediction 361 9.4 Splice Site Detection 362 Splice sites can be detected independently by specialized programs 363 9.5 Prediction of Promoter Regions 363 Prokaryotic promoter regions contain relatively well-defined motifs 364 Eukaryotic promoter regions are typically more complex than prokaryotic promoters 365 A variety of promoter-prediction methods are available online 365 Promoter prediction results are not very clear-cut 366 9.6 Confirming Predictions 367 There are various methods for calculating the accuracy of gene-prediction programs 367 Translating predicted exons can confirm the correctness of the prediction 368 Constructing the protein and identifying homologs 368 9.7 Genome Annotation 371 Genome annotation is the final step in genome analysis 372 Gene ontology provides a standard vocabulary for gene annotation 373 9.8 Large Genome Comparisons 378 Summary 379 Further Reading 380 CHAPTER 10 GENE DETECTION AND GENOME ANNOTATION 382 10.1 Detection of Functional RNA Molecules Using Decision Trees 386 Detection of tRNA genes using the tRNAscan algorithm 386 Detection of tRNA genes in eukaryotic genomes 387 10.2 Features Useful for Gene Detection in Prokaryotes 389 10.3 Algorithms for Gene Detection in Prokaryotes 393 GeneMark uses inhomogeneous Markov chains and dicodon statistics 393 GLIMMER uses interpolated Markov models of coding potential 396 ORPHEUS uses homology, codon statistics, and ribosome-binding sites 397 GeneMark.hmm uses explicit state duration hidden Markov models 398 EcoParse is an HMM gene model 401 10.4 Features Used in Eukaryotic Gene Detection 402 Differences between prokaryotic and eukaryotic genes 402 Introns, exons, and splice sites 404 Promoter sequences and binding sites for transcription factors 406 10.5 Predicting Eukaryotic Gene Signals 406 Detection of core promoter binding signals is a key element of some eukaryotic gene-prediction methods 406 A set of models has been designed to locate the site of core promoter sequence signals 408 Predicting promoter regions from general sequence properties can reduce the numbers of false-positive results 412 Predicting eukaryotic transcription and translation start sites 414 Translation and transcription stop signals complete the gene definition 414 10.6 Predicting Exons and Introns 414 Exons can be identified using general sequence properties 415 Splice-site prediction 417 Splice sites can be predicted by sequence patterns combined with base statistics 418 GenScan uses a combination of weight matrices and decision trees to locate splice sites 419 GeneSplicer predicts splice sites using first-order Markov chains 419 NetPlantGene combines neural networks with intron and exon predictions to predict splice sites 420 Other splicing features may yet be exploited for splice-site prediction 421 Specific methods exist to identify initial and terminal exons 421 Exons can be defined by searching databases for homologous regions 422 10.7 Complete Eukaryotic Gene Models 422 10.8 Beyond the Prediction of Individual Genes 424 Functional annotation 425 Comparison of related genomes can help resolve uncertain predictions 428 Evaluation and reevaluation of gene-detection methods 430 Summary 430 Further Reading 431 PART 5 SECONDARY STRUCTURES 434 CHAPTER 11 OBTAINING SECONDARY STRUCTURE FROM SEQUENCE 436 11.1 Types of Prediction Methods 438 Statistical methods are based on rules that give the probability that a residue will form part of a particular secondary structure 439 Nearest-neighbor methods are statistical methods that incorporate additional information about protein structure 439 Machine-learning approaches to secondary structure prediction mainly make use of neural networks and HMM methods 440 11.2 Training and Test Databases 441 There are several ways to define protein secondary structures 442 11.3 Assessing the Accuracy of Prediction Programs 442 Q3 measures the accuracy of individual residue assignments 442 Secondary structure predictions should not be expected to reach 100% residue accuracy 443 The Sov value measures the prediction accuracy for whole elements 444 CAFASP/CASP: Unbiased and readily available protein prediction assessments 444 11.4 Statistical and Knowledge-Based Methods 446 The GOR method uses an information theory approach 447 The program Zpred includes multiple alignment of homologous sequences and residue conservation information 450 There is an overall increase in prediction accuracy using multiple sequence information 451 The nearest-neighbor method: The use of multiple nonhomologous sequences 453 PREDATOR is a combined statistical and knowledge-based program that includes the nearest-neighbor approach 453 11.5 Neural Network Methods of Secondary Structure Prediction 455 Assessing the reliability of neural net predictions 457 Several examples of Web-based neural network secondary structure prediction programs 457 PROF: Protein forecasting 459 PSIPRED 459 Jnet: Using several alternative representations of the sequence alignment 459 11.6 Some Secondary Structures Require Specialized Prediction Methods 460 Transmembrane proteins 461 Quantifying the preference for a membrane environment 462 11.7 Prediction of Transmembrane Protein Structure 463 Multi-helix membrane proteins 464 A selection of prediction programs to predict transmembrane helices 466 Statistical methods 468 Knowledge-based prediction 468 Evolutionary information from protein families improves the prediction 469 Neural nets in transmembrane prediction 470 Predicting transmembrane helices with hidden Markov models 471 Comparing the results: What to choose 472 What happens if a non-transmembrane protein is submitted to transmembrane prediction programs 473 Prediction of transmembrane structure containing β-strands 473 11.8 Coiled-Coil Structures 476 The COILS prediction program 477 PAIRCOIL and MULTICOIL are an extension of the COILS algorithm 478 Zipping the leucine zipper: A specialized coiled coil 478 11.9 RNA Secondary Structure Prediction 480 Summary 483 Further Reading 484 CHAPTER 12 PREDICTING SECONDARY STRUCTURES 486 12.1 Defining Secondary Structure and Prediction Accuracy 488 The definitions used for automatic protein secondary structure assignment do not give identical results 489 There are several different measures of the accuracy of secondary structure prediction 494 12.2 Secondary Structure Prediction Based on Residue Propensities 497 Each structural state has an amino acid preference which can be assigned as a residue propensity 498 The simplest prediction methods are based on the average residue propensity over a sequence window 501 Residue propensities are modulated by nearby sequence 504 Predictions can be significantly improved by including information from homologous sequences 509 12.3 The Nearest-Neighbor Methods are Based on Sequence Segment Similarity 510 Short segments of similar sequence are found to have similar structure 512 Several sequence similarity measures have been used to identify nearest-neighbor segments 513 A weighted average of the nearest-neighbor segment structures is used to make the prediction 515 A nearest-neighbor method has been developed to predict regions with a high potential to misfold 516 12.4 Neural Networks Have Been Employed Successfully for Secondary Structure Prediction 517 Layered feed-forward neural networks can transform a sequence into a structural prediction 519 Inclusion of information on homologous sequences improves neural network accuracy 527 More complex neural nets have been applied to predict secondary and other structural features 528 12.5 Hidden Markov Models Have Been Applied to Structure Prediction 529 HMM methods have been found especially effective for transmembrane proteins 531 Nonmembrane protein secondary structures can also be successfully predicted with HMMs 534 12.6 General Data Classification Techniques can Predict Structural Features 535 Support vector machines have been successfully used for protein structure prediction 536 Discriminants, SOMs, and other methods have also been used 537 Summary 539 Further Reading 540 PART 6 TERTIARY STRUCTURES 544 CHAPTER 13 MODELING PROTEIN STRUCTURE 546 13.1 Potential Energy Functions and Force Fields 549 The conformation of a protein can be visualized in terms of a potential energy surface 550 Conformational energies can be described by simple mathematical functions 550 Similar force fields can be used to represent conformational energies in the presence of averaged environments 551 Potential energy functions can be used to assess a modeled structure 552 Energy minimization can be used to refine a modeled structure and identify local energy minima 552 Molecular dynamics and simulated annealing are used to find global energy minima 553 13.2 Obtaining a Structure by Threading 554 The prediction of protein folds in the absence of known structural homologs 556 Libraries or databases of nonredundant protein folds are used in threading 556 Two distinct types of scoring schemes have been used in threading methods 556 Dynamic programming methods can identify optimal alignments of target sequences and structural folds 558 Several methods are available to assess the confidence to be put on the fold prediction 559 The C2-like domain from the Dictyostelia: A practical example of threading 560 13.3 Principles of Homology Modeling 562 Closely related target and template sequences give better models 564 Significant sequence identity depends on the length of the sequence 565 Homology modeling has been automated to deal with the numbers of sequences that can now be modeled 566 Model building is based on a number of assumptions 566 13.4 Steps in Homology Modeling 567 Structural homologs to the target protein are found in the PDB 568 Accurate alignment of target and template sequences is essential for successful modeling 568 The structurally conserved regions of a protein are modeled first 569 The modeled core is checked for misfits before proceeding to the next stage 570 Sequence realignment and remodeling may improve the structure 570 Insertions and deletions are usually modeled as loops 570 Nonidentical amino acid side chains are modeled mainly by using rotamer libraries 572 Energy minimization is used to relieve structural errors 573 Molecular dynamics can be used to explore possible conformations for mobile loops 573 Models need to be checked for accuracy 574 How far can homology models be trusted? 576 13.5 Automated Homology Modeling 577 The program MODELLER models by satisfying protein structure constraints 578 COMPOSER uses fragment-based modeling to automatically generate a model 578 Automated methods available on the Web for comparative modeling 579 Assessment of structure prediction 579 13.6 Homology Modeling of PI3 Kinase p110α 582 Swiss-Pdb Viewer can be used for manual or semi-manual modeling 582 Alignment, core modeling, and side-chain modeling are carried out all in one 583 The loops are modeled from a database of possible structures 584 Energy minimization and quality inspection can be carried out within Swiss-Pdb Viewer 584 MolIDE is a downloadable semi-automatic modeling package 585 Automated modeling on the Web illustrated with p110α kinase 586 Modeling a functionally related but sequentially dissimilar protein: mTOR 588 Generating a multidomain three-dimensional structure from sequence 589 Summary 589 Further Reading 590 CHAPTER 14 ANALYZING STRUCTURE−FUNCTION RELATIONSHIPS 592 14.1 Functional Conservation 593 Functional regions are usually structurally conserved 594 Similar biochemical function can be found in proteins with different folds 595 Fold libraries identify structurally similar proteins regardless of function 596 14.2 Structure Comparison Methods 599 Finding domains in proteins aids structure comparison 599 Structural comparisons can reveal conserved functional elements not discernible from a sequence comparison 601 The CE method builds up a structural alignment from pairs of aligned protein segments 601 The Vector Alignment Search Tool (VAST) aligns secondary structural elements 602 DALI identifies structure superposition without maintaining segment order 603 FATCAT introduces rotations between rigid segments 604 14.3 Finding Binding Sites 605 Highly conserved, strongly charged, or hydrophobic surface areas may indicate interaction sites 607 Searching for protein–protein interactions using surface properties 609 Surface calculations highlight clefts or holes in a protein that may serve as binding sites 610 Looking at residue conservation can identify binding sites 611 14.4 Docking Methods and Programs 612 Simple docking procedures can be used when the structure of a homologous protein bound to a ligand analog is known 613 Specialized docking programs will automatically dock a ligand to a structure 613 Scoring functions are used to identify the most likely docked ligand 615 The DOCK program is a semirigid-body method that analyzes shape and chemical complementarity of ligand and binding site 615 Fragment docking identifies potential substrates by predicting types of atoms and functional groups in the binding area 616 GOLD is a flexible docking program, which utilizes a genetic algorithm 616 The water molecules in binding sites should also be considered 617 Summary 618 Further Reading 619 PART 7 CELLS AND ORGANISMS 622 CHAPTER 15 PROTEOME AND GENE EXPRESSION ANALYSIS 624 15.1 Analysis of Large-scale Gene Expression 626 The expression of large numbers of different genes can be measured simultaneously by DNA microarrays 627 Gene expression microarrays are mainly used to detect differences in gene expression in different conditions 627 Serial analysis of gene expression (SAGE) is also used to study global patterns of gene expression 629 Digital differential display uses bioinformatics and statistics to detect differential gene expression in different tissues 630 Facilitating the integration of data from different places and experiments 631 The simplest method of analyzing gene expression microarray data is hierarchical cluster analysis 631 Techniques based on self-organizing maps can be used for analyzing microarray data 633 Self-organizing tree algorithms (SOTAs) cluster from the top down by successive subdivision of clusters 635 Clustered gene expression data can be used as a tool for further research 635 15.2 Analysis of Large-scale Protein Expression 637 Two-dimensional gel electrophoresis is a method for separating the individual proteins in a cell 638 Measuring the expression levels shown in 2D gels 639 Differences in protein expression levels between different

Suitable for advanced undergraduates and postgraduates, Understanding Bioinformatics provides a definitive guide to this vibrant and evolving discipline. The book takes a conceptual approach. It guides the reader from first principles through to an understanding of the computational techniques and the key algorithms. Understanding Bioinformatics is an invaluable companion for students from their first encounter with the subject through to more advanced studies.

The book is divided into seven parts, with the opening part introducing the basics of nucleic acids, proteins and databases. Subsequent parts are divided into 'Applications' and 'Theory' Chapters, allowing readers to focus their attention effectively. In each section, the Applications Chapter provides a fast and straightforward route to understanding the main concepts and 'getting started'. Each of these is then followed by Theory Chapters which give greater detail and present the underlying mathematics. In Part 2, Sequence Alignments, the Applications Chapter shows the reader how to get started on producing and analyzing sequence alignments, and using sequences for database searching, while the next two chapters look closely at the more advanced techniques and the mathematical algorithms involved. Part 3 covers evolutionary processes and shows how bioinformatics can be used to help build phylogenetic trees. Part 4 looks at the characteristics of whole genomes. In Parts 5 and 6 the focus turns to secondary and tertiary structure – predicting structural conformation and analysing structure-function relationships. The last part surveys methods of analyzing data from a set of genes or proteins of an organism and is rounded off with an overview of systems biology.

The writing style of Understanding Bioinformatics is notable for its clarity, while the extensive, full-color artwork has been designed to present the key concepts with simplicity and consistency. Each chapter uses mind-maps and flow diagrams to give an overview of the conceptual links within each topic.

دانلود کتاب Understanding bioinformatics