وبلاگ بلیان

Programming Entity Framework: DbContext

معرفی کتاب «Programming Entity Framework: DbContext» نوشتهٔ Julia Lerman, Rowan Miller، منتشرشده توسط نشر Yahoo Press در سال 2012. این کتاب در فرمت pdf، زبان انگلیسی ارائه شده است. «Programming Entity Framework: DbContext» در دستهٔ بدون دسته‌بندی قرار دارد.

Table of Contents 7 Foreword 17 Preface 19 Administrative Notes 20 What’s in This Book? 20 What’s New in the Second Edition? 21 What’s New in the Third Edition? 21 Conventions Used in This Book 22 Using Code Examples 22 Safari® Books Online 23 How to Contact Us 23 Acknowledgments 24 Chapter 1. Meet Hadoop 27 Data! 27 Data Storage and Analysis 29 Comparison with Other Systems 30 Rational Database Management System 30 Grid Computing 32 Volunteer Computing 34 A Brief History of Hadoop 35 Apache Hadoop and the Hadoop Ecosystem 38 Hadoop Releases 39 What’s Covered in This Book 41 Configuration names 41 MapReduce APIs 41 Compatibility 41 Chapter 2. MapReduce 43 A Weather Dataset 43 Data Format 43 Analyzing the Data with Unix Tools 45 Analyzing the Data with Hadoop 46 Map and Reduce 46 Java MapReduce 48 A test run 51 The old and the new Java MapReduce APIs 53 Scaling Out 56 Data Flow 56 Combiner Functions 59 Specifying a combiner function 61 Running a Distributed MapReduce Job 62 Hadoop Streaming 62 Ruby 62 Python 65 Hadoop Pipes 66 Compiling and Running 67 Chapter 3. The Hadoop Distributed Filesystem 69 The Design of HDFS 69 HDFS Concepts 71 Blocks 71 Namenodes and Datanodes 72 HDFS Federation 73 HDFS High-Availability 74 Failover and fencing 75 The Command-Line Interface 75 Basic Filesystem Operations 76 Hadoop Filesystems 78 Interfaces 79 HTTP 79 C 81 FUSE 81 The Java Interface 81 Reading Data from a Hadoop URL 81 Reading Data Using the FileSystem API 83 FSDataInputStream 84 Writing Data 86 FSDataOutputStream 87 Directories 88 Querying the Filesystem 88 File metadata: FileStatus 88 Listing files 90 File patterns 91 PathFilter 92 Deleting Data 93 Data Flow 93 Anatomy of a File Read 93 Anatomy of a File Write 96 Coherency Model 98 Consequences for application design 100 Data Ingest with Flume and Sqoop 100 Parallel Copying with distcp 101 Keeping an HDFS Cluster Balanced 102 Hadoop Archives 103 Using Hadoop Archives 103 Limitations 105 Chapter 4. Hadoop I/O 107 Data Integrity 107 Data Integrity in HDFS 107 LocalFileSystem 108 ChecksumFileSystem 109 Compression 109 Codecs 111 Compressing and decompressing streams with CompressionCodec 111 Inferring CompressionCodecs using CompressionCodecFactory 112 Native libraries 113 CodecPool 114 Compression and Input Splits 115 Using Compression in MapReduce 116 Compressing map output 118 Serialization 119 The Writable Interface 120 WritableComparable and comparators 121 Writable Classes 122 Writable wrappers for Java primitives 122 Text 124 Indexing 124 Unicode 125 Iteration 126 BytesWritable 127 Mutability 127 Resorting to String 127 NullWritable 128 ObjectWritable and GenericWritable 128 Writable collections 128 Implementing a Custom Writable 129 Implementing a RawComparator for speed 132 Custom comparators 133 Serialization Frameworks 134 Serialization IDL 135 Avro 136 Avro Data Types and Schemas 137 In-Memory Serialization and Deserialization 140 The specific API 141 Avro Datafiles 143 Interoperability 144 Python API 144 C API 146 Schema Resolution 147 Sort Order 149 Avro MapReduce 150 Sorting Using Avro MapReduce 154 Avro MapReduce in Other Languages 156 File-Based Data Structures 156 SequenceFile 156 Writing a SequenceFile 157 Reading a SequenceFile 158 Displaying a SequenceFile with the command-line interface 161 Sorting and merging SequenceFiles 162 The SequenceFile format 162 MapFile 163 Writing a MapFile 164 Reading a MapFile 166 MapFile variants 167 Converting a SequenceFile to a MapFile 167 Chapter 5. Developing a MapReduce Application 169 The Configuration API 170 Combining Resources 171 Variable Expansion 172 Setting Up the Development Environment 172 Managing Configuration 174 GenericOptionsParser, Tool, and ToolRunner 176 Writing a Unit Test with MRUnit 180 Mapper 180 Reducer 182 Running Locally on Test Data 183 Running a Job in a Local Job Runner 183 Fixing the mapper 185 Testing the Driver 186 Running on a Cluster 187 Packaging a Job 188 The client classpath 188 The task classpath 188 Packaging dependencies 189 Task classpath precedence 189 Launching a Job 189 The MapReduce Web UI 191 The jobtracker page 191 The job page 194 Retrieving the Results 194 Debugging a Job 196 The tasks page 198 The task details page 198 Handling malformed data 200 Hadoop Logs 201 Remote Debugging 203 Tuning a Job 204 Profiling Tasks 205 The HPROF profiler 205 Other profilers 207 MapReduce Workflows 207 Decomposing a Problem into MapReduce Jobs 207 JobControl 209 Apache Oozie 209 Defining an Oozie workflow 210 Packaging and deploying an Oozie workflow application 212 Running an Oozie workflow job 213 Chapter 6. How MapReduce Works 215 Anatomy of a MapReduce Job Run 215 Classic MapReduce (MapReduce 1) 216 Job submission 216 Job initialization 217 Task assignment 218 Task execution 219 Progress and status updates 219 Streaming and pipes 219 Job completion 221 YARN (MapReduce 2) 222 Job submission 225 Job initialization 225 Task assignment 226 Task execution 226 Progress and status updates 227 Job completion 228 Failures 228 Failures in Classic MapReduce 228 Task failure 228 Tasktracker failure 230 Jobtracker failure 230 Failures in YARN 230 Task failure 231 Application master failure 231 Node manager failure 231 Resource manager failure 232 Job Scheduling 232 The Fair Scheduler 233 The Capacity Scheduler 233 Shuffle and Sort 234 The Map Side 234 The Reduce Side 236 Configuration Tuning 237 Task Execution 240 The Task Execution Environment 241 Streaming environment variables 241 Speculative Execution 241 Output Committers 243 Task side-effect files 244 Task JVM Reuse 245 Skipping Bad Records 246 Chapter 7. MapReduce Types and Formats 249 MapReduce Types 249 The Default MapReduce Job 254 The default Streaming job 258 Keys and values in Streaming 259 Input Formats 260 Input Splits and Records 260 FileInputFormat 262 FileInputFormat input paths 262 FileInputFormat input splits 264 Small files and CombineFileInputFormat 265 Preventing splitting 267 File information in the mapper 267 Processing a whole file as a record 268 Text Input 271 TextInputFormat 272 KeyValueTextInputFormat 273 NLineInputFormat 273 XML 274 Binary Input 275 SequenceFileInputFormat 275 SequenceFileAsTextInputFormat 276 SequenceFileAsBinaryInputFormat 276 Multiple Inputs 276 Database Input (and Output) 277 Output Formats 277 Text Output 278 Binary Output 278 SequenceFileOutputFormat 278 SequenceFileAsBinaryOutputFormat 279 MapFileOutputFormat 279 Multiple Outputs 279 An example: Partitioning data 279 MultipleOutputs 281 Lazy Output 283 Database Output 284 Chapter 8. MapReduce Features 285 Counters 285 Built-in Counters 285 Task counters 286 Job counters 289 User-Defined Java Counters 290 Dynamic counters 291 Readable counter names 292 Retrieving counters 292 User-Defined Streaming Counters 294 Sorting 294 Preparation 295 Partial Sort 296 An application: Partitioned MapFile lookups 297 Total Sort 300 Secondary Sort 303 Java code 305 Streaming 307 Joins 309 Map-Side Joins 310 Reduce-Side Joins 311 Side Data Distribution 314 Using the Job Configuration 314 Distributed Cache 315 Usage 315 How it works 318 The distributed cache API 318 MapReduce Library Classes 321 Chapter 9. Setting Up a Hadoop Cluster 323 Cluster Specification 323 Network Topology 325 Rack awareness 326 Cluster Setup and Installation 327 Installing Java 328 Creating a Hadoop User 328 Installing Hadoop 328 Testing the Installation 329 SSH Configuration 329 Hadoop Configuration 330 Configuration Management 331 Control scripts 331 Master node scenarios 332 Environment Settings 333 Memory 333 Java 335 System logfiles 335 SSH settings 336 Important Hadoop Daemon Properties 337 HDFS 338 MapReduce 340 Hadoop Daemon Addresses and Ports 342 Other Hadoop Properties 343 Cluster membership 343 Buffer size 343 HDFS block size 343 Reserved storage space 343 Trash 344 Job scheduler 344 Reduce slow start 344 Task memory limits 344 User Account Creation 346 YARN Configuration 346 Important YARN Daemon Properties 347 Memory 349 YARN Daemon Addresses and Ports 350 Security 351 Kerberos and Hadoop 352 An example 353 Delegation Tokens 354 Other Security Enhancements 355 Benchmarking a Hadoop Cluster 357 Hadoop Benchmarks 357 Benchmarking HDFS with TestDFSIO 357 Benchmarking MapReduce with Sort 358 Other benchmarks 359 User Jobs 359 Hadoop in the Cloud 360 Apache Whirr 360 Setup 360 Launching a cluster 361 Configuration 361 Running a proxy 362 Running a MapReduce job 363 Shutting down a cluster 364 Chapter 10. Administering Hadoop 365 HDFS 365 Persistent Data Structures 365 Namenode directory structure 365 The filesystem image and edit log 366 Secondary namenode directory structure 368 Datanode directory structure 369 Safe Mode 370 Entering and leaving safe mode 371 Audit Logging 372 Tools 373 dfsadmin 373 Filesystem check (fsck) 373 Datanode block scanner 375 Finding the blocks for a file 375 Balancer 376 Monitoring 377 Logging 378 Setting log levels 378 Getting stack traces 378 Metrics 378 FileContext 379 GangliaContext 380 NullContextWithUpdateThread 380 CompositeContext 381 Java Management Extensions 381 Maintenance 384 Routine Administration Procedures 384 Metadata backups 384 Data backups 384 Filesystem check (fsck) 385 Filesystem balancer 385 Commissioning and Decommissioning Nodes 385 Commissioning new nodes 385 Decommissioning old nodes 387 Upgrades 388 HDFS data and metadata upgrades 388 Start the upgrade 390 Wait until the upgrade is complete 390 Check the upgrade 390 Roll back the upgrade (optional) 390 Finalize the upgrade (optional) 391 Chapter 11. Pig 393 Installing and Running Pig 394 Execution Types 394 Local mode 395 MapReduce mode 395 Running Pig Programs 396 Grunt 396 Pig Latin Editors 397 An Example 397 Generating Examples 399 Comparison with Databases 400 Pig Latin 401 Structure 402 Statements 403 Expressions 407 Types 408 Schemas 410 Validation and nulls 411 Schema merging 413 Functions 414 Macros 416 User-Defined Functions 417 A Filter UDF 417 Leveraging types 419 An Eval UDF 420 Dynamic invokers 421 A Load UDF 422 Using a schema 424 Data Processing Operators 425 Loading and Storing Data 425 Filtering Data 426 FOREACH...GENERATE 426 STREAM 427 Grouping and Joining Data 428 JOIN 428 COGROUP 429 CROSS 431 GROUP 432 Sorting Data 433 Combining and Splitting Data 434 Pig in Practice 435 Parallelism 435 Parameter Substitution 436 Dynamic parameters 436 Parameter substitution processing 437 Chapter 12. Hive 439 Installing Hive 440 The Hive Shell 441 An Example 442 Running Hive 443 Configuring Hive 443 Logging 445 Hive Services 445 Hive clients 446 The Metastore 447 Comparison with Traditional Databases 449 Schema on Read Versus Schema on Write 449 Updates, Transactions, and Indexes 450 HiveQL 451 Data Types 452 Primitive types 453 Complex types 454 Operators and Functions 454 Conversions 455 Tables 455 Managed Tables and External Tables 455 Partitions and Buckets 457 Partitions 457 Buckets 459 Storage Formats 461 The default storage format: Delimited text 461 Binary storage formats: Sequence files, Avro datafiles and RCFiles 464 An example: RegexSerDe 466 Importing Data 467 Inserts 467 Multitable insert 468 CREATE TABLE...AS SELECT 468 Altering Tables 469 Dropping Tables 469 Querying Data 470 Sorting and Aggregating 470 MapReduce Scripts 471 Joins 472 Inner joins 472 Outer joins 473 Semi joins 474 Map joins 475 Subqueries 475 Views 476 User-Defined Functions 477 Writing a UDF 478 Writing a UDAF 480 A more complex UDAF 483 Chapter 13. HBase 485 HBasics 485 Backdrop 486 Concepts 486 Whirlwind Tour of the Data Model 486 Regions 487 Locking 487 Implementation 487 HBase in operation 489 Installation 490 Test Drive 491 Clients 493 Java 493 MapReduce 495 Avro, REST, and Thrift 496 REST 497 Thrift 497 Avro 497 Example 498 Schemas 498 Loading Data 499 Optimization notes 501 Web Queries 502 HBase Versus RDBMS 505 Successful Service 506 HBase 507 Use Case: HBase at Streamy.com 507 Very large items tables 508 Very large sort merges 508 Life with HBase 509 Praxis 509 Versions 509 HDFS 510 UI 511 Metrics 511 Schema Design 511 Joins 512 Row keys 512 Counters 512 Bulk Load 512 Chapter 14. ZooKeeper 515 Installing and Running ZooKeeper 516 An Example 518 Group Membership in ZooKeeper 518 Creating the Group 519 Joining a Group 521 Listing Members in a Group 522 ZooKeeper command-line tools 524 Deleting a Group 524 The ZooKeeper Service 525 Data Model 525 Ephemeral znodes 526 Sequence numbers 526 Watches 527 Operations 527 Multiupdate 528 APIs 528 Watch triggers 530 ACLs 531 Implementation 532 Consistency 533 Sessions 535 Time 536 States 537 Building Applications with ZooKeeper 538 A Configuration Service 538 The Resilient ZooKeeper Application 541 InterruptedException 542 KeeperException 542 State exceptions 542 A reliable configuration service 543 Recoverable exceptions 543 Unrecoverable exceptions 543 A Lock Service 545 The herd effect 546 Recoverable exceptions 546 Unrecoverable exceptions 547 Implementation 547 More Distributed Data Structures and Protocols 547 BookKeeper and Hedwig 548 ZooKeeper in Production 548 Resilience and Performance 549 Configuration 550 Chapter 15. Sqoop 553 Getting Sqoop 553 Sqoop Connectors 555 A Sample Import 555 Text and Binary File Formats 558 Generated Code 558 Additional Serialization Systems 559 Imports: A Deeper Look 559 Controlling the Import 561 Imports and Consistency 562 Direct-mode Imports 562 Working with Imported Data 562 Imported Data and Hive 563 Importing Large Objects 566 Performing an Export 568 Exports: A Deeper Look 569 Exports and Transactionality 571 Exports and SequenceFiles 571 Chapter 16. Case Studies 573 Hadoop Usage at Last.fm 573 Last.fm: The Social Music Revolution 573 Hadoop at Last.fm 573 Generating Charts with Hadoop 574 The Track Statistics Program 575 Calculating the number of unique listeners 576 UniqueListenersMapper 576 UniqueListenersReducer 577 Summing the track totals 578 SumMapper 578 SumReducer 579 Merging the results 580 MergeListenersMapper 580 IdentityMapper 581 SumReducer 581 Summary 582 Hadoop and Hive at Facebook 582 Hadoop at Facebook 582 History 582 Use cases 583 Data architecture 583 Hadoop configuration 585 Hypothetical Use Case Studies 585 Advertiser insights and performance 585 Ad hoc analysis and product feedback 587 Data analysis 588 Hive 588 Data organization 589 Query language 589 Data pipelines using Hive 590 Problems and Future Work 592 Fair sharing 592 Space management 592 Scribe-HDFS integration 593 Improvements to Hive 593 Nutch Search Engine 593 Data Structures 594 CrawlDb 595 LinkDb 595 Segments 595 Selected Examples of Hadoop Data Processing in Nutch 597 Link inversion 597 Generation of fetchlists 599 Step 1: Select, sort by score, limit by URL count per host 600 Step 2: Invert, partition by host, sort randomly 602 Fetcher: A multithreaded MapRunner in action 603 Indexer: Using custom OutputFormat 604 Summary 606 Log Processing at Rackspace 607 Requirements/The Problem 607 Logs 607 Brief History 608 Choosing Hadoop 608 Collection and Storage 608 Log collection 608 Log storage 609 MapReduce for Logs 609 Processing 609 Phase 1: Map 611 Phase 1: Reduce 611 Phase 2: Map 612 Phase 2: Reduce 612 Merging for near-term search 613 Archiving for analysis 613 Sharding 613 Search results 613 Cascading 615 Fields, Tuples, and Pipes 616 Operations 619 Taps, Schemes, and Flows 620 Cascading in Practice 621 Flexibility 624 Hadoop and Cascading at ShareThis 625 Summary 629 TeraByte Sort on Apache Hadoop 629 Using Pig and Wukong to Explore Billion-edge Network Graphs 633 Measuring Community 635 Everybody’s Talkin’ at Me: The Twitter Reply Graph 635 Edge pairs versus adjacency list 637 Degree 637 Symmetric Links 638 Community Extraction 639 Get neighbors 639 Community metrics and the 1 million × 1 million problem 640 Local properties at global scale 640 Appendix A. Installing Apache Hadoop 643 Prerequisites 643 Installation 643 Configuration 644 Standalone Mode 645 Pseudodistributed Mode 645 Configuring SSH 646 Formatting the HDFS filesystem 647 Starting and stopping the daemons (MapReduce 1) 647 Starting and stopping the daemons (MapReduce 2) 648 Fully Distributed Mode 648 Appendix B. Cloudera’s Distribution Including Apache Hadoop 649 Appendix C. Preparing the NCDC Weather Data 651 Index 655 Ready to unlock the power of your data? With this comprehensive guide, you’ll learn how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. This book is ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run Hadoop clusters.You’ll find illuminating case studies that demonstrate how Hadoop is used to solve specific problems. This third edition covers recent changes to Hadoop, including material on the new MapReduce API, as well as MapReduce 2 and its more flexible execution model (YARN).Store large datasets with the Hadoop Distributed File System (HDFS)Run distributed computations with MapReduceUse Hadoop’s data and I/O building blocks for compression, data integrity, serialization (including Avro), and persistenceDiscover common pitfalls and advanced features for writing real-world MapReduce programsDesign, build, and administer a dedicated Hadoop cluster—or run Hadoop in the cloudLoad data from relational databases into HDFS, using SqoopPerform large-scale data processing with the Pig query languageAnalyze datasets with Hive, Hadoop’s data warehousing systemTake advantage of HBase for structured and semi-structured data, and ZooKeeper for building distributed systems Ready to unlock the power of your data? With this comprehensive guide, you'll learn how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. This book is ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run Hadoop clusters. You'll find illuminating case studies that demonstrate how Hadoop is used to solve specific problems. This third edition covers recent changes to Hadoop, including material on the new MapReduce API, as well as MapReduce 2 and its more flexible execution model (YARN). Store large datasets with the Hadoop Distributed File System (HDFS) Run distributed computations with MapReduce Use Hadoop's data and I/O building blocks for compression, data integrity, serialization (including Avro), and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design, build, and administer a dedicated Hadoop cluster-or run Hadoop in the cloud Load data from relational databases into HDFS, using Sqoop Perform large-scale data processing with the Pig query language Analyze datasets with Hive, Hadoop's data warehousing system Take advantage of HBase for structured and semi-structured data, and ZooKeeper for building distributed systems
دانلود کتاب Programming Entity Framework: DbContext