وبلاگ بلیان

Practical Python data wrangling and data quality : getting started with reading, cleaning, and analyzing data

معرفی کتاب «Practical Python data wrangling and data quality : getting started with reading, cleaning, and analyzing data» نوشتهٔ Susan E. McGregor، منتشرشده توسط نشر O'Reilly Media در سال 2021. این کتاب در 5 صفحه، فرمت pdf، زبان انگلیسی ارائه شده است. «Practical Python data wrangling and data quality : getting started with reading, cleaning, and analyzing data» در دستهٔ بدون دسته‌بندی قرار دارد.

The world around us is full of data that holds unique insights and valuable stories, and this book will help you uncover them. Whether you already work with data or want to learn more about its possibilities, the examples and techniques in this practical book will help you more easily clean, evaluate, and analyze data so that you can generate meaningful insights and compelling visualizations. Complementing foundational concepts with expert advice, author Susan E. McGregor provides the resources you need to extract, evaluate, and analyze a wide variety of data sources and formats, along with the tools to communicate your findings effectively. This book delivers a methodical, jargon-free way for data practitioners at any level, from true novices to seasoned professionals, to harness the power of data. Use Python 3.8+ to read, write, and transform data from a variety of sources Understand and use programming basics in Python to wrangle data at scale Organize, document, and structure your code using best practices Collect data from structured data files, web pages, and APIs Perform basic statistical analyses to make meaning from datasets Visualize and present data in clear and compelling ways Preface 5 Who Should Read This Book? 6 Who Shouldn’t Read This Book? 8 What to Expect from This Volume 8 Conventions Used in This Book 9 Using Code Examples 10 O’Reilly Online Learning 10 How to Contact Us 11 Acknowledgments 12 1. Introduction to Data Wrangling and Data Quality 14 What Is “Data Wrangling”? 15 What Is “Data Quality”? 17 Data Integrity 18 Data “Fit” 19 Why Python? 21 Versatility 21 Accessibility 21 Readability 21 Community 22 Python Alternatives 22 Writing and “Running” Python 24 Working with Python on Your Own Device 26 Getting Started with the Command Line 27 Installing Python, Jupyter Notebook, and a Code Editor 31 Working with Python Online 38 Hello World! 38 Using Atom to Create a Standalone Python File 38 Using Jupyter to Create a New Python Notebook 40 Using Google Colab to Create a New Python Notebook 41 Adding the Code 41 In a Standalone File 42 In a Notebook 42 Running the Code 42 In a Standalone File 42 In a Notebook 43 Documenting, Saving, and Versioning Your Work 43 Documenting 43 Saving 45 Versioning 46 Conclusion 57 2. Introduction to Python 59 The Programming “Parts of Speech” 60 Nouns ≈ Variables 61 Verbs ≈ Functions 65 Cooking with Custom Functions 72 Libraries: Borrowing Custom Functions from Other Coders 72 Taking Control: Loops and Conditionals 73 In the Loop 73 One Condition… 78 Understanding Errors 83 Syntax Snafus 84 Runtime Runaround 86 Logic Loss 90 Hitting the Road with Citi Bike Data 93 Starting with Pseudocode 94 Seeking Scale 101 Conclusion 102 3. Understanding Data Quality 105 Assessing Data Fit 107 Validity 109 Reliability 112 Representativeness 113 Assessing Data Integrity 117 Necessary, but Not Sufficient 119 Important 121 Achievable 125 Improving Data Quality 129 Data Cleaning 129 Data Augmentation 130 Conclusion 131 4. Working with File-Based and Feed-Based Data in Python 133 Structured Versus Unstructured Data 136 Working with Structured Data 141 File-Based, Table-Type Data—Take It to Delimit 141 Wrangling Table-Type Data with Python 144 Real-World Data Wrangling: Understanding Unemployment 153 XLSX, ODS, and All the Rest 156 Finally, Fixed-Width 164 Feed-Based Data—Web-Driven Live Updates 170 Wrangling Feed-Type Data with Python 173 Working with Unstructured Data 191 Image-Based Text: Accessing Data in PDFs 191 Wrangling PDFs with Python 192 Accessing PDF Tables with Tabula 197 Conclusion 198 5. Accessing Web-Based Data 201 Accessing Online XML and JSON 203 Introducing APIs 206 Basic APIs: A Search Engine Example 207 Specialized APIs: Adding Basic Authentication 210 Getting a FRED API Key 211 Using Your API key to Request Data 212 Reading API Documentation 213 Protecting Your API Key When Using Python 218 Creating Your “Credentials” File 220 Using Your Credentials in a Separate Script 220 Getting Started with .gitignore 223 Specialized APIs: Working With OAuth 225 Applying for a Twitter Developer Account 227 Creating Your Twitter “App” and Credentials 229 Encoding Your API Key and Secret 235 Requesting an Access Token and Data from the Twitter API 236 API Ethics 241 Web Scraping: The Data Source of Last Resort 242 Carefully Scraping the MTA 245 Using Browser Inspection Tools 249 The Python Web Scraping Solution: Beautiful Soup 252 Conclusion 256 6. Assessing Data Quality 258 The Pandemic and the PPP 260 Assessing Data Integrity 261 Is It of Known Pedigree? 262 Is It Timely? 262 Is It Complete? 263 Is It Well-Annotated? 277 Is It High Volume? 283 Is It Consistent? 285 Is It Multivariate? 289 Is It Atomic? 292 Is It Clear? 292 Is It Dimensionally Structured? 294 Assessing Data Fit 295 Validity 295 Reliability 299 Representativeness 300 Conclusion 302 7. Cleaning, Transforming, and Augmenting Data 306 Selecting a Subset of Citi Bike Data 307 A Simple Split 308 Regular Expressions: Supercharged String Matching 311 Making a Date 318 De-crufting Data Files 321 Decrypting Excel Dates 325 Generating True CSVs from Fixed-Width Data 328 Correcting for Spelling Inconsistencies 332 The Circuitous Path to “Simple” Solutions 338 Gotchas That Will Get Ya! 341 Augmenting Your Data 343 Conclusion 346 8. Structuring and Refactoring Your Code 348 Revisiting Custom Functions 349 Will You Use It More Than Once? 349 Is It Ugly and Confusing? 349 Do You Just Really Hate the Default Functionality? 350 Understanding Scope 351 Defining the Parameters for Function “Ingredients” 353 What Are Your Options? 355 Getting Into Arguments? 356 Return Values 356 Climbing the “Stack” 358 Refactoring for Fun and Profit 360 A Function for Identifying Weekdays 360 Metadata Without the Mess 364 Documenting Your Custom Scripts and Functions with pydoc 374 The Case for Command-Line Arguments 379 Where Scripts and Notebooks Diverge 383 Conclusion 384 9. Introduction to Data Analysis 386 Context Is Everything 387 Same but Different 388 What’s Typical? Evaluating Central Tendency 389 What’s That Mean? 390 Embrace the Median 391 Think Different: Identifying Outliers 392 Visualization for Data Analysis 393 What’s Our Data’s Shape? Understanding Histograms 398 The Significance of Symmetry 399 Counting “Clusters” 409 The $2 Million Question 411 Proportional Response 426 Conclusion 430 10. Presenting Your Data 433 Foundations for Visual Eloquence 434 Making Your Data Statement 436 Charts, Graphs, and Maps: Oh My! 438 Pie Charts 439 Bar and Column Charts 442 Line Charts 448 Scatter Charts 453 Maps 457 Elements of Eloquent Visuals 461 The “Finicky” Details Really Do Make a Difference 461 Trust Your Eyes (and the Experts) 462 Selecting Scales 463 Choosing Colors 464 Above All, Annotate! 465 From Basic to Beautiful: Customizing a Visualization with seaborn and matplotlib 466 Beyond the Basics 472 Conclusion 473 11. Beyond Python 475 Additional Tools for Data Review 476 Spreadsheet Programs 476 OpenRefine 478 Additional Tools for Sharing and Presenting Data 480 Image Editing for JPGs, PNGs, and GIFs 481 Software for Editing SVGs and Other Vector Formats 481 Reflecting on Ethics 483 Conclusion 484 A. More Python Programming Resources 486 Official Python Documentation 486 Installing Python Resources 487 Where to Look for Libraries 487 Keeping Your Tools Sharp 488 Where to Learn More 489 B. A Bit More About Git 491 You Run git push/pull and End Up in a Weird Text Editor 491 Your git push/pull Command Gets Rejected 494 Run git pull 495 Git Quick Reference 497 C. Finding Data 500 Data Repositories and APIs 500 Subject Matter Experts 501 FOIA/L Requests 502 Custom Data Collection 504 D. Resources for Visualization and Information Design 506 Foundational Books on Information Visualization 506 The Quick Reference You’ll Reach For 507 Sources of Inspiration 507 Index 509 About the Author 577 "The world around us is full of data that holds unique insights and valuable stories, and this book will help you uncover them. Whether you already work with data or want to learn more about its possibilities, the examples and techniques in this practical book will help you more easily clean, evaluate, and analyze data so that you can generate meaningful insights and compelling visualizations. Complementing foundational concepts with expert advice, author Susan E. McGregor provides the resources you need to extract, evaluate, and analyze a wide variety of data sources and formats, along with the tools to communicate your findings effectively. This book delivers a methodical, jargon-free way for data practitioners at any level, from true novices to seasoned professionals, to harness the power of data." Quatrième de couverture
دانلود کتاب Practical Python data wrangling and data quality : getting started with reading, cleaning, and analyzing data