وبلاگ بلیان

Algorithms and Parallel Computing (Wiley Series on Parallel and Distributed Computing)

معرفی کتاب «Algorithms and Parallel Computing (Wiley Series on Parallel and Distributed Computing)» نوشتهٔ Fayez Gebali; Wiley InterScience (Online service)، منتشرشده توسط نشر John Wiley & Sons در سال 2011. این کتاب در فرمت pdf، زبان انگلیسی ارائه شده است. «Algorithms and Parallel Computing (Wiley Series on Parallel and Distributed Computing)» در دستهٔ بدون دسته‌بندی قرار دارد.

There is a software gap between the hardware potential and the performance that can be attained using today's software parallel program development tools. The tools need manual intervention by the programmer to parallelize the code. Programming a parallel computer requires closely studying the target algorithm or application, more so than in the traditional sequential programming we have all learned. The programmer must be aware of the communication and data dependencies of the algorithm or application. This book provides the techniques to explore the possible ways to program a parallel computer for a given application.Content: Chapter 1 Introduction (pages 1–27): Chapter 2 Enhancing Uniprocessor Performance (pages 29–51): Chapter 3 Parallel Computers (pages 53–68): Chapter 4 Shared?Memory Multiprocessors (pages 69–82): Chapter 5 Interconnection Networks (pages 83–103): Chapter 6 Concurrency Platforms (pages 105–130): Chapter 7 Ad Hoc Techniques for Parallel Algorithms (pages 131–142): Chapter 8 Nonserial–Parallel Algorithms (pages 143–157): Chapter 9 z?Transform Analysis (pages 159–165): Chapter 10 Dependence Graph Analysis (pages 167–183): Chapter 11 Computational Geometry Analysis (pages 185–208): Chapter 12 Case Study: One?Dimensional IIR Digital Filters (pages 209–218): Chapter 13 Case Study: Two? and Three?Dimensional Digital Filters (pages 219–226): Chapter 14 Case Study: Multirate Decimators and Interpolators (pages 227–244): Chapter 15 Case Study: Pattern Matching (pages 245–254): Chapter 16 Case Study: Motion Estimation for Video Compression (pages 255–266): Chapter 17 Case Study: Multiplication over GF(2m) (pages 267–277): Chapter 18 Case Study: Polynomial Division over GF(2) (pages 279–291): Chapter 19 The Fast Fourier Transform (pages 293–303): Chapter 20 Solving Systems of Linear Equations (pages 305–321): Chapter 21 Solving Partial Differential Equations Using Finite Difference Method (pages 323–329): Algorithms and Parallel Computing......Page 5 Contents......Page 9 Preface......Page 15 List of Acronyms......Page 21 1.1 INTRODUCTION......Page 25 1.2 TOWARD AUTOMATING PARALLEL PROGRAMMING......Page 26 1.3 ALGORITHMS......Page 28 1.4 PARALLEL COMPUTING DESIGN CONSIDERATIONS......Page 36 1.5 PARALLEL ALGORITHMS AND PARALLEL ARCHITECTURES......Page 37 1.7 IMPLEMENTATION OF ALGORITHMS: A TWO-SIDED PROBLEM......Page 38 1.8 MEASURING BENEFITS OF PARALLEL COMPUTING......Page 39 1.9 AMDAHL’S LAW FOR MULTIPROCESSOR SYSTEMS......Page 43 1.10 GUSTAFSON–BARSIS’S LAW......Page 45 1.11 APPLICATIONS OF PARALLEL COMPUTING......Page 46 2.1 INTRODUCTION......Page 53 2.3 PARALLELIZING ALU STRUCTURE......Page 54 2.4 USING MEMORY HIERARCHY......Page 57 2.5 PIPELINING......Page 63 2.6 VERY LONG INSTRUCTION WORD (VLIW) PROCESSORS......Page 68 2.7 INSTRUCTION-LEVEL PARALLELISM (ILP) AND SUPERSCALAR PROCESSORS......Page 69 2.8 MULTITHREADED PROCESSOR......Page 73 3.2 PARALLEL COMPUTING......Page 77 3.3 SHARED-MEMORY MULTIPROCESSORS (UNIFORM MEMORY ACCESS [UMA])......Page 78 3.4 DISTRIBUTED-MEMORY MULTIPROCESSOR (NONUNIFORM MEMORY ACCESS [NUMA])......Page 80 3.6 SYSTOLIC PROCESSORS......Page 81 3.8 GRID (CLOUD) COMPUTING......Page 84 3.9 MULTICORE SYSTEMS......Page 85 3.10 SM......Page 86 3.11 COMMUNICATION BETWEEN PARALLEL PROCESSORS......Page 88 3.12 SUMMARY OF PARALLEL ARCHITECTURES......Page 91 4.1 INTRODUCTION......Page 93 4.2 CACHE COHERENCE AND MEMORY CONSISTENCY......Page 94 4.3 SYNCHRONIZATION AND MUTUAL EXCLUSION......Page 100 5.1 INTRODUCTION......Page 107 5.2 CLASSIFICATION OF INTERCONNECTION NETWORKS BY LOGICAL TOPOLOGIES......Page 108 5.3 INTERCONNECTION NETWORK SWITCH ARCHITECTURE......Page 115 6.2 CONCURRENCY PLATFORMS......Page 129 6.3 CILK++......Page 130 6.4 OpenMP......Page 136 6.5 COMPUTE UNIFIED DEVICE ARCHITECTURE (CUDA)......Page 146 7.1 INTRODUCTION......Page 155 7.3 INDEPENDENT LOOP SCHEDULING......Page 157 7.4 DEPENDENT LOOPS......Page 158 7.6 LOOP UNROLLING......Page 159 7.7 PROBLEM PARTITIONING......Page 160 7.8 DIVIDE-AND-CONQUER (RECURSIVE PARTITIONING) STRATEGIES......Page 161 7.9 PIPELINING......Page 163 8.2 COMPARING DAG AND DCG ALGORITHMS......Page 167 8.3 PARALLELIZING NSPA ALGORITHMS REPRESENTED BY A DAG......Page 169 8.4 FORMAL TECHNIQUE FOR ANALYZING NSPAs......Page 171 8.5 DETECTING CYCLES IN THE ALGORITHM......Page 174 8.6 EXTRACTING SERIAL AND PARALLEL ALGORITHM PERFORMANCE PARAMETERS......Page 175 8.7 USEFUL THEOREMS......Page 177 8.8 PERFORMANCE OF SERIAL AND PARALLEL ALGORITHMS ON PARALLEL COMPUTERS......Page 180 9.2 DEFINITION OF z-TRANSFORM......Page 183 9.3 THE 1-D FIR DIGITAL FILTER ALGORITHM......Page 184 9.4 SOFTWARE AND HARDWARE IMPLEMENTATIONS OF THE z-TRANSFORM......Page 185 9.5 DESIGN 1: USING HORNER’S RULE FOR BROADCAST INPUT AND PIPELINED OUTPUT......Page 186 9.6 DESIGN 2: PIPELINED INPUT AND BROADCAST OUTPUT......Page 187 9.7 DESIGN 3: PIPELINED INPUT AND OUTPUT......Page 188 10.2 THE 1-D FIR DIGITAL FILTER ALGORITHM......Page 191 10.3 THE DEPENDENCE GRAPH OF AN ALGORITHM......Page 192 10.4 DERIVING THE DEPENDENCE GRAPH FOR AN ALGORITHM......Page 193 10.5 THE SCHEDULING FUNCTION FOR THE 1-D FIR FILTER......Page 195 10.6 NODE PROJECTION OPERATION......Page 201 10.7 NONLINEAR PROJECTION OPERATION......Page 203 10.8 SOFTWARE AND HARDWARE IMPLEMENTATIONS OF THE DAG TECHNIQUE......Page 204 11.2 MATRIX MULTIPLICATION ALGORITHM......Page 209 11.3 THE 3-D DEPENDENCE GRAPH AND COMPUTATION DOMAIN D......Page 210 11.5 THE DEPENDENCE MATRICES OF THE ALGORITHM VARIABLES......Page 212 11.6 NULLSPACE OF DEPENDENCE MATRIX: THE BROADCAST SUBDOMAIN B......Page 213 11.7 DESIGN SPACE EXPLORATION: CHOICE OF BROADCASTING VERSUS PIPELINING VARIABLES......Page 216 11.8 DATA SCHEDULING......Page 219 11.9 PROJECTION OPERATION USING THE LINEAR PROJECTION OPERATOR......Page 224 11.10 EFFECT OF PROJECTION OPERATION ON DATA......Page 229 11.11 THE RESULTING MULTITHREADED/MULTIPROCESSOR ARCHITECTURE......Page 230 11.12 SUMMARY OF WORK DONE IN THIS CHAPTER......Page 231 12.3 THE IIR FILTER DEPENDENCE GRAPH......Page 233 12.4 z-DOMAIN ANALYSIS OF 1-D IIR DIGITAL FILTER ALGORITHM......Page 240 13.2 LINE AND FRAME WRAPAROUND PROBLEMS......Page 243 13.3 2-D RECURSIVE FILTERS......Page 245 13.4 3-D DIGITAL FILTERS......Page 247 14.2 DECIMATOR STRUCTURES......Page 251 14.3 DECIMATOR DEPENDENCE GRAPH......Page 252 14.4 DECIMATOR SCHEDULING......Page 254 14.5 DECIMATOR DAG FOR s1 = [1 0]......Page 255 14.6 DECIMATOR DAG FOR s2 = [1 −1]......Page 257 14.8 POLYPHASE DECIMATOR IMPLEMENTATIONS......Page 259 14.9 INTERPOLATOR STRUCTURES......Page 260 14.10 INTERPOLATOR DEPENDENCE GRAPH......Page 261 14.11 INTERPOLATOR SCHEDULING......Page 262 14.12 INTERPOLATOR DAG FOR s1 = [1 0]......Page 263 14.13 INTERPOLATOR DAG FOR s2 = [1 −1]......Page 265 14.15 POLYPHASE INTERPOLATOR IMPLEMENTATIONS......Page 267 15.2 EXPRESSING THE ALGORITHM AS A REGULAR ITERATIVE ALGORITHM (RIA)......Page 269 15.3 OBTAINING THE ALGORITHM DEPENDENCE GRAPH......Page 270 15.4 DATA SCHEDULING......Page 271 15.5 DAG NODE PROJECTION......Page 272 15.6 DESIGN 1: DESIGN SPACE EXPLORATION WHEN s = [1 1]t......Page 273 15.7 DESIGN 2: DESIGN SPACE EXPLORATION WHEN s = [1 −1]t......Page 276 15.8 DESIGN 3: DESIGN SPACE EXPLORATION WHEN s = [1 0]t......Page 277 16.1 INTRODUCTION......Page 279 16.2 FBMAS......Page 280 16.3 DATA BUFFERING REQUIREMENTS......Page 281 16.4 FORMULATION OF THE FBMA......Page 282 16.5 HIERARCHICAL FORMULATION OF MOTION ESTIMATION......Page 283 16.6 HARDWARE DESIGN OF THE HIERARCHY BLOCKS......Page 285 17.1 INTRODUCTION......Page 291 17.2 THE MULTIPLICATION ALGORITHM IN GF (2m)......Page 292 17.4 FIELD MULTIPLICATION DEPENDENCE GRAPH......Page 294 17.5 DATA SCHEDULING......Page 295 17.6 DAG NODE PROJECTION......Page 297 17.8 DESIGN 2: USING d2 = [1 1]t......Page 299 17.10 APPLICATIONS OF FINITE FIELD MULTIPLIERS......Page 301 18.2 THE POLYNOMIAL DIVISION ALGORITHM......Page 303 18.3 THE LFSR DEPENDENCE GRAPH......Page 305 18.4 DATA SCHEDULING......Page 306 18.5 DAG NODE PROJECTION......Page 307 18.6 DESIGN 1: DESIGN SPACE EXPLORATION WHEN s1 = [1 −1]......Page 308 18.7 DESIGN 2: DESIGN SPACE EXPLORATION WHEN s2 = [1 0]......Page 310 18.8 DESIGN 3: DESIGN SPACE EXPLORATION WHEN s3 = [1 −0.5]......Page 313 18.9 COMPARING THE THREE DESIGNS......Page 315 19.1 INTRODUCTION......Page 317 19.2 DECIMATION-IN-TIME FFT......Page 319 19.3 PIPELINE RADIX-2 DECIMATION-IN-TIME FFT PROCESSOR......Page 322 19.4 DECIMATION-IN-FREQUENCY FFT......Page 323 19.5 PIPELINE RADIX-2 DECIMATION-IN-FREQUENCY FFT PROCESSOR......Page 327 20.2 SPECIAL MATRIX STRUCTURES......Page 329 20.3 FORWARD SUBSTITUTION (DIRECT TECHNIQUE)......Page 333 20.5 MATRIX TRIANGULARIZATION ALGORITHM......Page 336 20.6 SUCCESSIVE OVER RELAXATION (SOR) (ITERATIVE TECHNIQUE)......Page 341 20.7 PROBLEMS......Page 345 21.1 INTRODUCTION......Page 347 21.2 FDM FOR 1-D SYSTEMS......Page 348 References......Page 355 Index......Page 361 Frontmatter -- Introduction -- Enhancing Uniprocessor Performance -- Parallel Computers -- Shared-Memory Multiprocessors -- Interconnection Networks -- Concurrency Platforms -- Ad Hoc Techniques for Parallel Algorithms -- Nonserial b6 sParallel Algorithms -- -Transform Analysis -- Dependence Graph Analysis -- Computational Geometry Analysis -- Case Study: One-Dimensional IIR Digital Filters -- Case Study: Two- and Three-Dimensional Digital Filters -- Case Study: Multirate Decimators and Interpolators -- Case Study: Pattern Matching -- Case Study: Motion Estimation for Video Compression -- Case Study: Multiplication over GF(2) -- Case Study: Polynomial Division over GF(2) -- The Fast Fourier Transform -- Solving Systems of Linear Equations -- Solving Partial Differential Equations Using Finite Difference Method -- References -- Index "There is a software gap between the hardware potential and the performance that can be attained using today's software parallel program development tools. The tools need manual intervention by the programmer to parallelize the code. Programming a parallel computer requires closely studying the target algorithm or application, more so than in the traditional sequential programming we have all learned. The programmer must be aware of the communication and data dependencies of the algorithm or application. This book provides the techniques to explore the possible ways to program a parallel computer for a given application"--Provided by publisher. "This book provides the techniques to explore the possible ways to program a parallel computer for a given application"--Provided by publisher. "There is a software gap between the hardware potential and the performance that can be attained using today's software parallel program development tools. The tools need manual intervention by the programmer to parallelize the code. Programming a parallel computer requires closely studying the target algorithm or application, more so than in the traditional sequential programming we have all learned. The programmer must be aware of the communication and data dependencies of the algorithm or application. This book provides the techniques to explore the possible ways to program a parallel computer for a given application"-- Résumé de l'éd
دانلود کتاب Algorithms and Parallel Computing (Wiley Series on Parallel and Distributed Computing)