Algorithms for Big Data : DFG Priority Program 1736

معرفی کتاب «Algorithms for Big Data : DFG Priority Program 1736» نوشتهٔ Hannah Bast, Claudius Korzen, Ulrich Meyer, Manuel Penschuck، منتشرشده توسط نشر Springer Nature Switzerland AG در سال 1320. این کتاب در فرمت pdf، زبان انگلیسی ارائه شده است. «Algorithms for Big Data : DFG Priority Program 1736» در دستهٔ بدون دسته‌بندی قرار دارد.

This open access book surveys the progress in addressing selected challenges related to the growth of big data in combination with increasingly complicated hardware. It emerged from a research program established by the German Research Foundation (DFG) as priority program SPP 1736 on Algorithmics for Big Data where researchers from theoretical computer science worked together with application experts in order to tackle problems in domains such as networking, genomics research, and information retrieval. Such domains are unthinkable without substantial hardware and software support, and these systems acquire, process, exchange, and store data at an exponential rate. The chapters of this volume summarize the results of projects realized within the program and survey-related work. This is an open access book. Preface About the SPP 1736 About This Book Organization Contents Algorithms for Large and Complex Networks Algorithms for Large-Scale Network Analysis and the NetworKit Toolkit 1 Introduction 2 NetworKit—An Overview 2.1 Design Considerations 2.2 Ecosystem 3 Centrality Algorithms 3.1 Individual Centrality Scores 3.2 Improving One's Own Centrality 3.3 Group Centrality Optimization 4 Community Detection 5 Graph Sparsification 6 Graph Drawing and Network Data Visualization 7 Conclusions References Generating Synthetic Graph Data from Random Network Models 1 Motivation 1.1 Structure 1.2 Notation 1.3 Models of Computation 2 Random Graphs and the G(n,p) and G(n,m) Models 2.1 Sampling from G(n,p) and G(n,m) 3 Preferential Attachment 4 R-MAT 5 Simple Graphs from Prescribed Degree Sequence 5.1 The Edge Switching Markov Chain Model 5.2 Curveball 6 Geometrically Embedded Random Graphs 6.1 Efficient Generators Based on Geometric Data Structures 6.2 A Fast and Memory-Efficient Streaming Generator for RHG 6.3 Communication-Agnostic Generators for RHG 6.4 GIRG-Based Generator 7 Software Packages References Increasing the Sampling Efficiency for the Link Assessment Problem 1 Introduction 2 Link Assessment Based on z* 2.1 Ground Truth and PPVk 2.2 Random Graph Models 2.3 Co-occurrence Gradient in the FDSM 3 The Benchmarking Problem 4 Edge Switching vs. Curveball 4.1 Perturbation Score 4.2 Runtime Comparison with NetworKit 5 Phase Transition and Heuristics 5.1 Heuristics for MCMC Parameters 5.2 Phase Transitions as Mixing Quality Estimation 6 Summary and Conclusion References A Custom Hardware Architecture for the Link Assessment Problem 1 Introduction 2 Basics of Hardware and Systems Design 2.1 Hardware/Software System Design Flow 2.2 FPGA Basics 3 Hardware Architectures for the Link Assessment Computation 3.1 Data Structures 3.2 Memory Boundedness 3.3 Co-occurrence Calculation 3.4 Swap Randomization 4 Performance Comparison 5 Conclusion References Graph-Based Methods for Rational Drug Design 1 Introduction 2 Preliminaries 3 Molecular Similarity Based on Graphs 3.1 Challenges and Approaches in Comparing Molecular Graphs 3.2 Maximum Similar Subgraph Based Similarities for Molecules 4 Clustering Analysis 4.1 Challenges and Approaches in Molecular Cluster Analysis 4.2 StruClus: Scalable Structural Graph Set Clustering 5 Rational Drug Design Applications 5.1 Scaffold Hunter 5.2 BRD4 5.3 Chipmunk 6 Conclusion and Outlook References Recent Advances in Practical Data Reduction 1 Introduction 2 Recent Advances for NP-Hard Problems 2.1 Maximum Independent Set and Minimum Vertex Cover 2.2 Finding and Enumerating Maximum Cliques 2.3 Maximum Cuts 2.4 Treewidth and Treedepth 2.5 Hitting Set 2.6 Steiner Trees 2.7 Minimum Fill-In 2.8 Vertex Coloring 2.9 Cluster Editing 2.10 Multiterminal Cut 3 Recent Advances for Problems in P 3.1 Minimum Cut 3.2 Matching 4 Engineering Techniques 5 Open Problems and Future Work References Skeleton-Based Clustering by Quasi-Threshold Editing 1 Introduction 2 Preliminaries 2.1 Motivation 2.2 Related Work 3 Quasi-Threshold Mover (QTM) 3.1 Linear Running Time 3.2 Sorting Simple Paths 3.3 Inclusion-Minimal Editing 3.4 Randomized Choices 4 Experimental Evaluation 4.1 Sorting Paths and Randomization 4.2 Initialization and Convergence 4.3 Running Time 5 Conclusion References The Space Complexity of Undirected Graph Exploration 1 Introduction 2 Exploration and Feasibility 3 Trapping a Single Agent 4 Reingold's Algorithm 5 Trapping Multiple Agents 6 Multi-agent Exploration 7 Bibliographic Notes References Algorithms for Big Data and Their Applications Scalable Cryptography 1 Introduction and Motivation 2 Tightly Secure Cryptography 3 Post-Quantum Cryptography 4 Open Questions References Distributed Data Streams 1 Introduction 2 Model 2.1 Problems 2.2 Filter-Based Algorithms 2.3 Computational Primitives 3 Top-k-Value Monitoring 3.1 Dynamic Distributed Data Structure 3.2 Filter-Based Algorithm 4 Top-k-Position Monitoring 4.1 Filter-based Top-k-Position Monitoring 4.2 Filter-Based Approx. Top-k-Position Monitoring 4.3 Approximate Offline Algorithm 5 (Approximate) Count Distinct Monitoring 5.1 Computation for One Time Step 5.2 Monitoring over Multiple Time Steps 6 Conclusion References Energy-Efficient Scheduling 1 Introduction 2 Dynamic Speed Scaling 2.1 Speed Scaling on Heterogeneous Processors 2.2 The Offline Problem with General Power Functions 2.3 The Offline Problem with Standard Power Functions 2.4 An Online Algorithm 2.5 Further Results 3 Power-Down Mechanisms in Data Centers 3.1 Heterogeneous Servers 3.2 Servers with Two States 3.3 Servers with Multiple States 3.4 Homogeneous Servers References The GENO Software Stack 1 Introduction 2 Modeling Language 3 Generic Optimizer 3.1 Solver for Bound-Constrained Smooth Problems 3.2 Solver for Constrained Smooth Problems 4 Matrix and Tensor Calculus 4.1 Problems with Matrix Notation 4.2 Einstein Notation 4.3 Tensor Calculus 5 autoBLAS 5.1 A Simple autoBLAS Example 5.2 Design 6 Conclusions References Algorithms for Big Data Problems in de Novo Genome Assembly 1 Reduction of Input Data in Genome Assembly 1.1 Reads, Coverage and Assembly 1.2 The Bignorm Algorithm 2 Counting k–mers in External Memory (EM) 2.1 Counting in RAM 2.2 Counting in External Memory 2.3 Counting Using a Bloomfilter 3 A Streaming Algorithm for the Longest Path Problem 3.1 Our Tree-Based Algorithm 3.2 Linear Complexity of the Streaming Algorithm 4 An One Pass Streaming Algorithm for Computing the Euler Tour in Graphs 4.1 The W-Streaming Model and a Lower Bound 4.2 The Problem of Cycle Merging 4.3 The W-Streaming Algorithm and Its Analysis References Scalable Text Index Construction 1 Introduction 2 Preliminaries 2.1 Models of Computation 2.2 Building Blocks 3 Text Indices 3.1 Scalable Suffix Array Construction 3.2 Compressed Full-Text Index 3.3 Suffix Trees 3.4 Query Answering 4 Applications 4.1 Bioinformatics 4.2 Compression 5 Conclusion and Future Work References Author Index

دانلود کتاب Algorithms for Big Data : DFG Priority Program 1736