Beginning Apache Cassandra Development
معرفی کتاب «Beginning Apache Cassandra Development» نوشتهٔ Vivek Mishra (auth.)، منتشرشده توسط نشر Apress : Imprint: Apress در سال 2014. این کتاب در فرمت pdf، زبان انگلیسی ارائه شده است. «Beginning Apache Cassandra Development» در دستهٔ بدون دستهبندی قرار دارد.
__Beginning Apache Cassandra Development__ introduces you to one of the most robust and best-performing NoSQL database platforms on the planet. Apache Cassandra is a document database following the JSON document model. It is specifically designed to manage large amounts of data across many commodity servers without there being any single point of failure. This design approach makes Apache Cassandra a robust and easy-to-implement platform when high availability is needed. Apache Cassandra can be used by developers in Java, PHP, Python, and JavaScript—the primary and most commonly used languages. In __Beginning Apache Cassandra Development__, author and Cassandra expert Vivek Mishra takes you through using Apache Cassandra from each of these primary languages. Mishra also covers the Cassandra Query Language (CQL), the Apache Cassandra analog to SQL. You'll learn to develop applications sourcing data from Cassandra, query that data, and deliver it at speed to your application's users. Cassandra is one of the leading NoSQL databases, meaning you get unparalleled throughput and performance without the sort of processing overhead that comes with traditional proprietary databases. __Beginning Apache Cassandra Development__ will therefore help you create applications that generate search results quickly, stand up to high levels of demand, scale as your user base grows, ensure operational simplicity, and—not least—provide delightful user experiences. Contents at a Glance 3 Contents 226 About the Author 233 About the Technical Reviewer 234 Acknowledgments 235 Introduction 4 Chapter 1: NoSQL: Cassandra Basics 5 Introducing NoSQL 5 NoSQL Ecosystem 6 CAP Theorem 6 Budding Schema 7 Scalability 8 No Single Point of Failure 8 High Availability 9 Identifying the Big Data Problem 9 Introducing Cassandra 10 Distributed Databases 11 Peer-to-Peer Design 11 Configurable Data Consistency 12 Write Consistency 12 Read Consistency 12 Cassandra Query Language (CQL) 13 Installing Cassandra 13 Logging in Cassandra 14 Application Logging Options 15 Changing Log Properties 15 Managing Logs via JConsole 15 Understanding Cassandra Configuration 17 Commit Log Archival 22 archive_ command 22 restore_command 22 Configuring Replication and Data Center 23 LocalStrategy 24 NetworkTopologyStrategy 24 SimpleStrategy 25 Cassandra Multiple Node Configuration 25 Configuring Multiple Nodes over a Single Machine 25 Configuring Multiple Nodes over Amazon EC2 27 Summary 30 Chapter 2: Cassandra Data Modeling 31 Introducing Data Modeling 31 Data Types 32 Dynamic Columns 33 Dynamic Columns via Thrift 33 Dynamic Columns via cqlsh Using Map Support 35 Dynamic Columns via cqlsh Using Set Support 37 Secondary Indexes 38 CQL3 and Thrift Interoperability 40 Changing Data Types 42 Thrift Way 42 CQL3 Way 43 Counter Column 44 Counter Column with and without replicate_on_write 44 Play with Counter Columns 45 Data Modeling Tips 46 Summary 46 Chapter 3: Indexes and Composite Columns 47 Indexes 47 Clustered Indexes vs. Non-Clustered Indexes 48 Index Distribution 49 Indexing in Cassandra 49 Secondary Indexes 49 Composite Columns 52 Allow Filtering 54 Expiring Columns 55 Default TTL 56 Data Partitioning 57 Changing Partitioners 57 Data Colocation 57 Cassandra Writes 58 Cassandra Reads 59 What’s New in Cassandra 2.0 60 Compare and Set 60 Algorithm 61 Using CAS 61 Secondary Index over Composite Columns 62 Conditional DDL 63 Summary 64 Chapter 4: Cassandra Data Security 65 Authentication and Authorization 65 system and system_auth Keyspaces 66 The system Keyspace Is Unmodifiable 67 Accessing system_auth Keyspace with Authentication Enabled 68 Managing User Permissions 70 Accessing system_auth with AllowAllAuthorizer 74 Preparing Server Certificates 77 Connecting with SSL Encryption 79 Connecting via Cassandra-cli 79 Connecting via cqlsh 81 Connecting via the Cassandra Thrift Client 81 Summary 82 Chapter 5: MapReduce with Cassandra 83 Batch Processing and MapReduce 83 Apache Hadoop 84 HDFS 85 MapReduce 85 Read and Store Tweets into HDFS 87 Reading Tweets 87 Storing Tweets into HDFS 89 Cassandra MapReduce Integration 91 Reading Tweets from HDFS and Storing Count Results into Cassandra 92 The Thrift Way 92 The CQL3 Way 95 Cassandra In and Cassandra Out 97 Stream or Real-Time Analytics 100 Summary 100 Chapter 6: Data Migration and Analytics 101 Data Migration and Analytics 102 Apache Pig 103 Setup and Installation 103 Understanding Pig 104 Pig Execution Modes 105 Local Mode 105 MapReduce Mode 106 Data Types 106 Simple Data Types 106 Complex Data Types 107 PigStorage 107 LOAD 108 STORE 108 FILTER 109 FOREACH 109 TOTUPLE 110 Counting Tweets 110 Pig with Cassandra 111 Data Import 112 Loading Sata with timeuuid 113 Apache Hive 114 Setup and Configuration 114 Understanding UDF, UDAF, and UDTF 115 Hive Tables 116 Local FS Data Loading 116 HDFS Data Loading 118 Hive External Table 119 Hive with Cassandra 121 Data Migration 122 In the Traditional Way 122 Apache Sqoop 123 Sqoop with Cassandra 123 Summary 125 Chapter 7: Titan Graph Databases with Cassandra 126 Introduction to Graphs 126 Simple and Nonsimple Graphs 127 Directed and Undirected Graphs 127 Cyclic and Acyclic Graphs 127 Open Source Software for Graphs 128 Graph Frameworks: TinkerPop 128 Pipes 128 Gremlin 129 Frames 129 Rexster 129 Furnace 129 Blueprints 129 Graph as a Database 130 Neo4J 132 OrientDB 132 InfiniteGraph 132 Titan 133 Titan Graph Databases 133 Basic Concepts 134 Vertex-Centric Indices 134 Edge Compression 134 Graph Partitioning 134 Backend Stores 135 Transaction Handling 136 Setup and Installation 136 Command-line Tools and Clients 136 Gremlin Shell 137 Rexster: Server, Rest API, and the Dog House 139 Rexster Dog House 140 Rexster REST API 143 Titan with Cassandra 144 Titan Java API 144 Cassandra for Backend Storage 145 Use Cases 146 Writing Data to a Graph 146 Reading from the Graph 147 Batch Loading 148 The Supernode Problem 150 Faster Deep Traversal 152 Summary 154 Chapter 8: Cassandra Performance Tuning 155 Understanding the Key Performance Indicators 155 CPU and Memory Utilization 156 Heavy Read/Write Throughput and Latency 156 Logical and Physical Reads 156 Cassandra Configuration 156 Data Caches 156 Cache Directory 157 Key Cache 157 populate_io_cache_on_flush 158 Row Cache 159 Bloom Filters 160 Off-Heap vs. On-Heap 160 Installing and Configuring jemalloc 161 Garbage Collection 161 Hinted Handoff 162 Heap Size Configuration 162 Cassandra Stress Testing 163 Write Mode 163 Read Mode 167 Monitoring 167 Compaction Strategy 168 Size-Tiered Compaction Strategy ( STCS) 168 Leveled Compaction Strategy ( LCS) 169 Yahoo Cloud Serving Benchmarking 169 Summary 171 Chapter 9: Cassandra: Administration and Monitoring 172 Adding Nodes to Cassandra Cluster 173 Replacing a Dead Node 174 Data Backup and Restoration 175 Using nodetool snapshot and sstableloader 176 Using nodetool refresh 178 Using clearsnapshot 179 Cassandra Monitoring Tools 179 Helenos 179 DataStax DevCenter and OpsCenter 184 OpsCenter 184 DevCenter 188 Summary 190 Chapter 10: Cassandra Utilities 191 Cassandra nodetool Utility 191 Ring Management 192 Checking Ring Status 192 Decommissioning a Node 193 Schema Management 195 cfstats 195 cfhistogram 196 cleanup 197 clearsnapshot 197 flush 198 repair 198 rebuild 199 rebuild_index 199 JSONifying Data 199 Exporting Data to JSON Files with sstable2json 199 Importing JSON Data with json2sstable 201 Cassandra Bulk Loading 203 Summary 208 Chapter 11: Upgrading Cassandra and Troubleshooting 209 Cassandra 2.1 209 User-Defined Types 210 Frozen Types 210 Indexing on Collection Attributes 210 Upgrading Cassandra Versions 211 Backward Compatibility 212 Performing an Upgrade with a Rolling Restart 212 Troubleshooting Cassandra 213 Too Many Open Files 213 Stack Size Limit 214 Out of Memory Errors 214 Too Much Garbage Collection Activity 214 Road Ahead with Cassandra 215 Summary 216 References 216 Index 217 Annotation Beginning Apache Cassandra Development introduces you to one of the most robust and best-performing NoSQL database platforms on the planet. Apache Cassandra is a document database following the JSON document model. It is specifically designed to manage large amounts of data across many commodity servers without there being any single point of failure. This design approach makes Apache Cassandra a robust and easy-to-implement platform when high availability is needed. Apache Cassandra can be used by developers in Java, PHP, Python, and JavaScript the primary and most commonly used languages. In Beginning Apache Cassandra Development, author and Cassandra expert Vivek Mishra takes you through using Apache Cassandra from each of these primary languages. Mishra also covers the Cassandra Query Language (CQL), the Apache Cassandra analog to SQL. You'll learn to develop applications sourcing data from Cassandra, query that data, and deliver it at speed to your application's users. Cassandra is one of the leading NoSQL databases, meaning you get unparalleled throughput and performance without the sort of processing overhead that comes with traditional proprietary databases. Beginning Apache Cassandra Development will therefore help you create applications that generate search results quickly, stand up to high levels of demand, scale as your user base grows, ensure operational simplicity, and not least provide delightful user experiences. What you'll learn Configure Apache Cassandra clustersModel your data for high throughputImplement MapReduce algorithmsRun Hive and Pig queries over CassandraQuery with the Cassandra Query LanguageBuild graph-based solutions with Cassandra TitanBack up your data and restore when neededEncrypt and secure your data Who this book is forBeginning Apache Cassandra Development is aimed at developers wanting a high-performing and highly-available database from which to serve large amounts of data at speed to application users. The book is especially suited toward developers working in Java, PHP, Python, and JavaScript who are interested in a NoSQL solution." Front Matter....Pages i-xxi NoSQL: Cassandra Basics....Pages 1-26 Cassandra Data Modeling....Pages 27-42 Indexes and Composite Columns....Pages 43-60 Cassandra Data Security....Pages 61-78 MapReduce with Cassandra....Pages 79-96 Data Migration and Analytics....Pages 97-121 Titan Graph Databases with Cassandra....Pages 123-151 Cassandra Performance Tuning....Pages 153-169 Cassandra: Administration and Monitoring....Pages 171-189 Cassandra Utilities....Pages 191-208 Upgrading Cassandra and Troubleshooting....Pages 209-216 Back Matter....Pages 217-222
دانلود کتاب Beginning Apache Cassandra Development