Hacks, Leaks, and Revelations : The Art of Analyzing Hacked and Leaked Data
معرفی کتاب «Hacks, Leaks, and Revelations : The Art of Analyzing Hacked and Leaked Data» نوشتهٔ Micah Lee, Micah Lee، منتشرشده توسط نشر No Starch Press در سال 2023. این کتاب در فرمت epub، زبان انگلیسی ارائه شده است. «Hacks, Leaks, and Revelations : The Art of Analyzing Hacked and Leaked Data» در دستهٔ بدون دستهبندی قرار دارد.
Unlock the internet’s treasure trove of public interest data with Hacks, Leaks, and Revelations by Micah Lee, an investigative reporter and security engineer. This hands-on guide blends real-world techniques for researching large datasets with lessons on coding, data authentication, and digital security. All of this is spiced up with gripping stories from the front lines of investigative journalism. Dive into exposed datasets from a wide array of sources: the FBI, the DHS, police intelligence agencies, extremist groups like the Oath Keepers, and even a Russian ransomware gang. Lee’s own in-depth case studies on disinformation-peddling pandemic profiteers and neo-Nazi chatrooms serve as blueprints for your research. Gain practical skills in searching massive troves of data for keywords like “antifa” and pinpointing documents with newsworthy revelations. Get a crash course in Python to automate the analysis of millions of files. Using Python or other programming languages, you can give your computer precise instructions for performing tasks that existing tools or shell scripts don’t allow. For example, you could write a Python script that scours a million pieces of video metadata to determine where the videos were filmed. In my experience, Python is also simpler, easier to understand, and less error-prone than shell scripts. This chapter provides a crash course on the fundamentals of Python programming. You’ll learn to write and execute Python scripts and use the interactive Python interpreter. You’ll also use Python to do math, define variables, work with strings and Boolean logic, loop through lists of items, and use functions. Future chapters rely on your understanding of these basic skills. You will also learn how to: Master encrypted messaging to safely communicate with whistleblowers. Secure datasets over encrypted channels using Signal, Tor Browser, OnionShare, and SecureDrop. Harvest data from the BlueLeaks collection of internal memos, financial records, and more from over 200 state, local, and federal agencies. Probe leaked email archives about offshore detention centers and the Heritage Foundation. Analyze metadata from videos of the January 6 attack on the US Capitol, sourced from the Parler social network. Cover Praise for Hacks, Leaks, and Revelations Title Page Copyright Dedication About the Author and Technical Reviewer Acknowledgments Introduction Why I Wrote This Book What You’ll Learn What You’ll Need Part I: Sources and Datasets 1. Protecting Sources and Yourself Safely Communicating with Sources Working with Public Data Protecting Sensitive Information Minimizing the Digital Trail Working with Hackers and Whistleblowers Secure Storage for Datasets Low-Sensitivity Datasets Medium-Sensitivity Datasets High-Sensitivity Datasets Authenticating Datasets The AFLDS Dataset The WikiLeaks Twitter Group Chat Redaction What Data to Publish What to Redact Making Requests for Comment Password Managers Disk Encryption Exercise 1-1: Encrypt Your Internal Disk Windows macOS Linux Exercise 1-2: Encrypt a USB Disk Windows macOS Linux Protecting Yourself from Malicious Documents Exercise 1-3: Install and Use Dangerzone Summary 2. Acquiring Datasets The End of WikiLeaks Distributed Denial of Secrets Downloading Datasets with BitTorrent The Origins of BlueLeaks Exercise 2-1: Download the BlueLeaks Dataset Communicating with Encrypted Messaging Apps Exercise 2-2: Install and Practice Using Signal Encrypting Messages with PGP Staying Anonymous Online with Tor and OnionShare Exercise 2-3: Play with Tor and OnionShare Communicating with My Tea Party Patriots Source Other Options for Acquiring Datasets from Sources Encrypted USB Drives Virtual Private Servers Whistleblower Submission Systems Summary Part II: Tools of the Trade 3. The Command Line Interface Introducing the Command Line The Shell Users and Paths User Privileges Exercise 3-1: Install Ubuntu in Windows Basic Command Line Usage Opening a Terminal Clearing Your Screen and Exiting the Shell Exploring Files and Directories Navigating Relative and Absolute Paths Changing Directories Using the help Argument Accessing Man Pages Tips for Navigating the Terminal Entering Commands with Tab Completion Editing Commands Dealing with Spaces in Filenames Using Single Quotes Around Double Quotes Installing and Uninstalling Software with Package Managers Exercise 3-2: Manage Packages with Homebrew on macOS Exercise 3-3: Manage Packages with apt on Windows or Linux Exercise 3-4: Practice Using the Command Line with cURL Download a Web Page with cURL Save a Web Page to a File Text Files vs. Binary Files Exercise 3-5: Install the VS Code Text Editor Exercise 3-6: Write Your First Shell Script Navigate to Your USB Disk Create an Exercises Folder Open a VS Code Workspace Write the Shell Script Run the Shell Script Exercise 3-7: Clone the Book’s GitHub Repository Summary 4. Exploring Datasets in the Terminal Introducing for Loops Exercise 4-1: Unzip the BlueLeaks Dataset Unzip Files on macOS or Linux Unzip Files on Windows Organize Your Files How the Hacker Obtained the BlueLeaks Data Exercise 4-2: Explore BlueLeaks on the Command Line Calculate How Much Disk Space Folders Use Use Pipes and Sort Output Create an Inventory of Filenames in a Dataset Count the Files in a Dataset Exercise 4-3: Find Revelations in BlueLeaks with grep Filter for Documents Mentioning Antifa Filter for Certain Types of Files Use grep with Regular Expressions Search Files in Bulk with grep Encrypted Data in the BlueLeaks Dataset Data Analysis with Servers in the Cloud Exercise 4-4: Set Up a VPS Generate an SSH Key Add Your Public Key to the Cloud Provider Create a VPS SSH into Your Server Start a Byobu Session Install Updates Exercise 4-5: Explore the Oath Keepers Dataset Remotely Summary 5. Docker, Aleph, and Making Datasets Searchable Introducing Docker and Linux Containers Exercise 5-1: Initialize Docker Desktop on Windows and macOS Exercise 5-2: Initialize Docker Engine on Linux Running Containers with Docker Running an Ubuntu Container Listing and Killing Containers Mounting and Removing Volumes Passing Environment Variables Running Server Software Freeing Up Disk Space Exercise 5-3: Run a WordPress Site with Docker Compose Make a docker-compose.yaml File Start Your WordPress Site Introducing Aleph Exercise 5-4: Run Aleph Locally in Linux Containers Using Aleph’s Web and Command Line Interfaces Indexing Data in Aleph Exercise 5-5: Index a BlueLeaks Folder in Aleph Mount Your Datasets into the Aleph Shell Index the icefishx Folder Check Indexing Status Explore BlueLeaks with Aleph Additional Aleph Features Dedicated Aleph Servers Summary 6. Reading Other People’s Email The Email Protocol and Message Structure File Formats for Email Dumps EML Files MBOX Files PST Outlook Data Files Exercise 6-1: Download Email Dumps from Three Datasets The Nauru Police Force Dataset The Oath Keepers Dataset The Heritage Foundation Dataset Researching Email Dumps with Thunderbird Exercise 6-2: Configure Thunderbird for Email Dumps Reading Individual EML Files with Thunderbird Exercise 6-3: Import the Nauru Police Force EML Email Dump Searching Email in Thunderbird Quick Filter Searches The Search Messages Dialog Exercise 6-4: Import the Oath Keepers MBOX Email Dump Exercise 6-5: Import the Heritage Foundation PST Email Dump Other Tools for Researching Email Dumps Microsoft Outlook Aleph Summary Part III: Python Programming 7. An Introduction to Python Exercise 7-1: Install Python Windows Linux macOS Exercise 7-2: Write Your First Python Script Python Basics The Interactive Python Interpreter Comments Math with Python Strings Exercise 7-3: Write a Python Script with Variables, Math, and Strings Lists and Loops Defining and Printing Lists Running for Loops Control Flow Comparison Operators if Statements Nested Code Blocks Searching Lists Logical Operators Exception Handling Exercise 7-4: Practice Loops and Control Flow Functions The def Keyword Default Arguments Return Values Docstrings Exercise 7-5: Practice Writing Functions Summary 8. Working with Data in Python Modules Python Script Template Exercise 8-1: Traverse the Files in BlueLeaks List the Filenames in a Folder Count the Files and Folders in a Folder Traverse Folders with os.walk() Exercise 8-2: Find the Largest Files in BlueLeaks Third-Party Modules Exercise 8-3: Practice Command Line Arguments with Click Avoiding Hardcoding with Command Line Arguments Exercise 8-4: Find the Largest Files in Any Dataset Dictionaries Defining Dictionaries Getting and Setting Values Navigating Dictionaries and Lists in the Conti Chat Logs Exploring Dictionaries and Lists Full of Data in Python Selecting Values in Dictionaries and Lists Analyzing Data Stored in Dictionaries and Lists Exercise 8-5: Map Out the CSVs in BlueLeaks Accept a Command Line Argument Loop Through the BlueLeaks Folders Fill Up the Dictionary Display the Output Reading and Writing Files Opening Files Writing Lines to a File Reading Lines from a File Exercise 8-6: Practice Reading and Writing Files Summary Part IV: Structured Data 9. Blueleaks, Black Lives Matter, and the CSV File Format Installing Spreadsheet Software Introducing the CSV File Format Exploring CSV Files with Spreadsheet Software and Text Editors My BlueLeaks Investigation Focusing on a Fusion Center Introducing NCRIC Investigating a SAR Reading and Writing CSV Files in Python Exercise 9-1: Make BlueLeaks CSVs More Readable Accept the CSV Path as an Argument Loop Through the CSV Rows Display CSV Fields on Separate Lines How to Read Bulk Email from Fusion Centers Lists of Black Lives Matter Demonstrations “Intelligence” Memos from the FBI and DHS A Brief HTML Primer Exercise 9-2: Make Bulk Email Readable Accept the Command Line Arguments Create the Output Folder Define the Filename for Each Row Write the HTML Version of Each Bulk Email Discovering the Names and URLs of BlueLeaks Sites Exercise 9-3: Make a CSV of BlueLeaks Sites Open a CSV for Writing Find All the Company.csv Files Add BlueLeaks Sites to the CSV Summary 10. Blueleaks Explorer Undiscovered Revelations in BlueLeaks Exercise 10-1: Install BlueLeaks Explorer Create the Docker Compose Configuration File Bring Up the Containers Initialize the Databases The Structure of NCRIC Exploring Tables and Relationships Searching for Keywords Building Your Own BlueLeaks Structure Defining the JRIC Structure Showing Useful Fields Changing Field Types Adding JRIC’s Leads Table Building a Relationship Verifying BlueLeaks Data Exercise 10-2: Finish Building the Structure for JRIC The Technology Behind BlueLeaks Explorer The Backend The Frontend Summary 11. Parler, the January 6 Insurrection, and the JSON File format The Origins of the Parler Dataset How the Parler Videos Were Archived The Dataset’s Impact on Trump’s Second Impeachment Exercise 11-1: Download and Extract Parler Video Metadata Download the Metadata Uncompress and Download Individual Parler Videos Extract Parler Metadata The JSON File Format Understanding JSON Syntax Parsing JSON with Python Handling Exceptions with JSON Tools for Exploring JSON Data Counting Videos with GPS Coordinates Using grep Formatting and Searching Data with the jq Command Exercise 11-2: Write a Script to Filter for Videos with GPS from January 6, 2021 Accept the Parler Metadata Path as an Argument Loop Through Parler Metadata Files Filter for Videos with GPS Coordinates Filter for Videos from January 6, 2021 Working with GPS Coordinates Searching by Latitude and Longitude Converting Between GPS Coordinate Formats Calculating GPS Distance in Python Finding the Center of Washington, DC Exercise 11-3: Update the Script to Filter for Insurrection Videos Plotting GPS Coordinates on a Map with simplekml Exercise 11-4: Create KML Files to Visualize Location Data Create a KML File for All Videos with GPS Coordinates Create KML Files for Videos from January 6, 2021 Visualizing Location Data with Google Earth Viewing Metadata with ExifTool Summary 12. Epik Fail, Extremism Research, and SQL Databases The Structure of SQL Databases Relational Databases Clients and Servers Tables, Columns, and Types Exercise 12-1: Create and Test a MySQL Server Using Docker and Adminer Run the Server Connect to the Database with Adminer Create a Test Database Exercise 12-2: Query Your SQL Database INSERT Statements SELECT Statements JOIN Clauses UPDATE Statements DELETE Statements Introducing the MySQL Command Line Client Exercise 12-3: Install and Test the Command Line MySQL Client MySQL-Specific Queries The History of Epik The Epik Hack Epik’s WHOIS Data Exercise 12-4: Download and Extract Part of the Epik Dataset Exercise 12-5: Import Epik Data into MySQL Create a Database for api_system Import api_system Data Exploring Epik’s SQL Database The domain Table The privacy Table The hosting and hosting_server Tables Working with Epik Data in the Cloud Summary Part V: Case Studies 13. Pandemic Profiteers and Covid-19 Disinformation The Origins of AFLDS The Cadence Health and Ravkoo Datasets Extracting the Data into an Encrypted File Container Analyzing the Data with Command Line Tools Creating a Single Spreadsheet of Patients Calculating Revenue from Prescriptions Filled by Ravkoo Finding the Price and Quantity of Drugs Sold Categorizing Prescription Data by Drug A Deeper Look at the Cadence Health Patient Data Finding Cadence’s Partners Searching for Patients by City Searching for Patients by Age Authenticating the Data The Aftermath HIPAA’s Breach Notification Rule Congressional Investigation Simone Gold’s New Business Venture Scandal and Infighting at AFLDS Summary 14. Neo-Nazis and their Chatrooms How Antifascists Infiltrated Neo-Nazi Discord Servers Analyzing Leaked Chat Logs Making JSON Files Readable Exploring Objects, Keys, and Values with jq Converting Timestamps Finding Usernames The Discord History Tracker A Script to Search the JSON Files My Discord Analysis Code Designing the SQL Database Importing Chat Logs into the SQL Database Building the Web Interface Using Discord Analysis to Find Revelations The Pony Power Discord Server The Launch of DiscordLeaks The Aftermath The Lawsuit Against Unite the Right The Patriot Front Chat Logs Summary Afterword A. Solutions to Common WSL Problems Understanding WSL’s Linux Filesystem The Disk Performance Problem Solving the Disk Performance Problem Storing Only Active Datasets in Linux Storing Your Linux Filesystem on a USB Disk Next Steps B. Scraping the Web Legal Considerations HTTP Requests Scraping Techniques Loading Pages with HTTPX Parsing HTML with Beautiful Soup Automating Web Browsers with Selenium Next Steps Index Data-science investigations have brought journalism into the 21st century, and—guided by The Intercept ’s infosec expert Micah Lee— this book is your blueprint for uncovering hidden secrets in hacked datasets. Unlock the internet’s treasure trove of public interest data with Hacks, Leaks, and Revelations by Micah Lee, an investigative reporter and security engineer. This hands-on guide blends real-world techniques for researching large datasets with lessons on coding, data authentication, and digital security. All of this is spiced up with gripping stories from the front lines of investigative journalism. Dive into exposed datasets from a wide array of sources: the FBI, the DHS, police intelligence agencies, extremist groups like the Oath Keepers, and even a Russian ransomware gang. Lee’s own in-depth case studies on disinformation-peddling pandemic profiteers and neo-Nazi chatrooms serve as blueprints for your research. Gain practical skills in searching massive troves of data for keywords like “antifa” and pinpointing documents with newsworthy revelations. Get a crash course in Python to automate the analysis of millions of files. You will also learn how to: Master encrypted messaging to safely communicate with whistleblowers. Secure datasets over encrypted channels using Signal, Tor Browser, OnionShare, and SecureDrop. Harvest data from the BlueLeaks collection of internal memos, financial records, and more from over 200 state, local, and federal agencies. Probe leaked email archives about offshore detention centers and the Heritage Foundation. Analyze metadata from videos of the January 6 attack on the US Capitol, sourced from the Parler social network. We live in an age where hacking and whistleblowing can unearth secrets that alter history. Hacks, Leaks, and Revelations is your toolkit for uncovering new stories and hidden truths. Crack open your laptop, plug in a hard drive, and get ready to change history. Data-science investigations have brought journalism into the 21st century, andguided by The Intercept s infosec expert Micah Lee this book is your blueprint for uncovering hidden secrets in hacked datasets. In the current age of hacking and whistleblowing, the internet contains massive troves of leaked information. These complex datasets can be goldmines of revelations in the public interest if you know how to access and analyze them. For investigative journalists, hacktivists, and amateur researchers alike, this book provides the technical expertise needed to find and transform unintelligible files into groundbreaking reports. Guided by renowned investigative journalist and infosec expert Micah Lee, who helped secure Edward Snowdens communications with the press, youll learn the tools, technologies, and programming basics needed to crack open and interrogate datasets freely available on the internet or your own private datasets obtained directly from sources. Each chapter features hands-on exercises using real hacked data from governments, companies, and political groups, as well as interesting nuggets from datasets that never made it into published stories. Youll dig into hacked files from the BlueLeaks law enforcement records, analyze social-media traffic related to the 2021 attack on the U.S. Capitol, and get the exclusive story of privately leaked data from anti-vaccine group Americas Frontline Doctors. Along the way, youll Introduction Part 1: Sources and Datasets Chapter 1: Protecting Sources and Yourself Chapter 2: Acquiring Datasets Part 2: Tools of the Trade Chapter 3: The Command Line Interface Chapter 4: Exploring Datasets in the Terminal Chapter 5: Docker, Aleph, and Making Datasets Searchable Chapter 6: Reading Other People's Emails Part 3: Writing Code Chapter 7: An Introduction to Python Chapter 8: Working with Data in Python Part 4: Structured Data Chapter 9: BlueLeaks, Black Lives Matter, and the CSV File Format Chapter 10: BlueLeaks Explorer Chapter 11: Parler, the Insurrection of January 6, and the JSON File Format Chapter 12: Epik Fail, Extremism Research, and SQL Databases Part 5: Case Studies Chapter 13: Pandemic Profiteers and COVID-19 Disinformation Chapter 14: Neo-Nazis and Their Chat Rooms Afterword Appendixes Appendix A: Using the Windows Subsystem for Linux Appendix B: Scraping the Web
دانلود کتاب Hacks, Leaks, and Revelations : The Art of Analyzing Hacked and Leaked Data