Introducing Data Science : Big data, machine learning, and more, using Python tools
معرفی کتاب «Introducing Data Science : Big data, machine learning, and more, using Python tools» نوشتهٔ Arno D. B. Meysman Davy Cielen & Mohamed Ali، منتشرشده توسط نشر Manning Publications Co. LLC در سال 2016. این کتاب در فرمت pdf، زبان انگلیسی ارائه شده است. «Introducing Data Science : Big data, machine learning, and more, using Python tools» در دستهٔ بدون دستهبندی قرار دارد.
Introducing Data Science teaches you how to accomplish the fundamental tasks that occupy data scientists. Using the Python language and common Python libraries, you'll experience firsthand the challenges of dealing with data at scale and gain a solid foundation in data science. About the Technology Many companies need developers with data science skills to work on projects ranging from social media marketing to Read more... Abstract: Introducing Data Science teaches you how to accomplish the fundamental tasks that occupy data scientists. Using the Python language and common Python libraries, you'll experience firsthand the challenges of dealing with data at scale and gain a solid foundation in data science. About the Technology Many companies need developers with data science skills to work on projects ranging from social media marketing to machine learning. Discovering what you need to learn to begin a career as a data scientist can seem bewildering. This book is designed to help you get started. About the Book Introducing Data Science explains vital data science concepts and teaches you how to accomplish the fundamental tasks that occupy data scientists. You'll explore data visualization, graph databases, the use of NoSQL, and the data science process. You'll use the Python language and common Python libraries as you experience firsthand the challenges of dealing with data at scale. Discover how Python allows you to gain insights from data sets so big that they need to be stored on multiple machines, or from data moving so quickly that no single machine can handle it. This book gives you hands-on experience with the most popular Python data science libraries, Scikit-learn and StatsModels. After reading this book, you'll have the solid foundation you need to start a career in data science. What's Inside Handling large data Introduction to machine learning Using Python to work with data Writing data science algorithms About the Authors Davy Cielen, Arno D.B. Meysman, and Mohamed Ali are the founders and managing partners of Optimately and Maiton, where they focus on developing data science projects and solutions in various sectors Front cover brief contents contents preface acknowledgments about this book Roadmap Whom this book is for Code conventions and downloads about the authors Author Online about the cover illustration 1 Data science in a big data world 1.1 Benefits and uses of data science and big data 1.2 Facets of data 1.2.1 Structured data 1.2.2 Unstructured data 1.2.3 Natural language 1.2.4 Machine-generated data 1.2.5 Graph-based or network data 1.2.6 Audio, image, and video 1.2.7 Streaming data 1.3 The data science process 1.3.1 Setting the research goal 1.3.2 Retrieving data 1.3.3 Data preparation 1.3.4 Data exploration 1.3.5 Data modeling or model building 1.3.6 Presentation and automation 1.4 The big data ecosystem and data science 1.4.1 Distributed file systems 1.4.2 Distributed programming framework 1.4.3 Data integration framework 1.4.4 Machine learning frameworks 1.4.5 NoSQL databases 1.4.6 Scheduling tools 1.4.7 Benchmarking tools 1.4.8 System deployment 1.4.9 Service programming 1.4.10 Security 1.5 An introductory working example of Hadoop 1.6 Summary 2 The data science process 2.1 Overview of the data science process 2.1.1 Don’t be a slave to the process 2.2 Step 1: Defining research goals and creating a project charter 2.2.1 Spend time understanding the goals and context of your research 2.2.2 Create a project charter 2.3 Step 2: Retrieving data 2.3.1 Start with data stored within the company 2.3.2 Don’t be afraid to shop around 2.3.3 Do data quality checks now to prevent problems later 2.4 Step 3: Cleansing, integrating, and transforming data 2.4.1 Cleansing data 2.4.2 Correct errors as early as possible 2.4.3 Combining data from different data sources 2.4.4 Transforming data 2.5 Step 4: Exploratory data analysis 2.6 Step 5: Build the models 2.6.1 Model and variable selection 2.6.2 Model execution 2.6.3 Model diagnostics and model comparison 2.7 Step 6: Presenting findings and building applications on top of them 2.8 Summary 3 Machine learning 3.1 What is machine learning and why should you care about it? 3.1.1 Applications for machine learning in data science 3.1.2 Where machine learning is used in the data science process 3.1.3 Python tools used in machine learning 3.2 The modeling process 3.2.1 Engineering features and selecting a model 3.2.2 Training your model 3.2.3 Validating a model 3.2.4 Predicting new observations 3.3 Types of machine learning 3.3.1 Supervised learning 3.3.2 Unsupervised learning 3.4 Semi-supervised learning 3.5 Summary 4 Handling large data on a single computer 4.1 The problems you face when handling large data 4.2 General techniques for handling large volumes of data 4.2.1 Choosing the right algorithm 4.2.2 Choosing the right data structure 4.2.3 Selecting the right tools 4.3 General programming tips for dealing with large data sets 4.3.1 Don’t reinvent the wheel 4.3.2 Get the most out of your hardware 4.3.3 Reduce your computing needs 4.4 Case study 1: Predicting malicious URLs 4.4.1 Step 1: Defining the research goal 4.4.2 Step 2: Acquiring the URL data 4.4.3 Step 4: Data exploration 4.4.4 Step 5: Model building 4.5 Case study 2: Building a recommender system inside a database 4.5.1 Tools and techniques needed 4.5.2 Step 1: Research question 4.5.3 Step 3: Data preparation 4.5.4 Step 5: Model building 4.5.5 Step 6: Presentation and automation 4.6 Summary 5 First steps in big data 5.1 Distributing data storage and processing with frameworks 5.1.1 Hadoop: a framework for storing and processing large data sets 5.1.2 Spark: replacing MapReduce for better performance 5.2 Case study: Assessing risk when loaning money 5.2.1 Step 1: The research goal 5.2.2 Step 2: Data retrieval 5.2.3 Step 3: Data preparation 5.2.4 Step 4: Data exploration & Step 6: Report building 5.3 Summary 6 Join the NoSQL movement 6.1 Introduction to NoSQL 6.1.1 ACID: the core principle of relational databases 6.1.2 CAP Theorem: the problem with DBs on many nodes 6.1.3 The BASE principles of NoSQL databases 6.1.4 NoSQL database types 6.2 Case study: What disease is that? 6.2.1 Step 1: Setting the research goal 6.2.2 Steps 2 and 3: Data retrieval and preparation 6.2.3 Step 4: Data exploration 6.2.4 Step 3 revisited: Data preparation for disease profiling 6.2.5 Step 4 revisited: Data exploration for disease profiling 6.2.6 Step 6: Presentation and automation 6.3 Summary 7 The rise of graph databases 7.1 Introducing connected data and graph databases 7.1.1 Why and when should I use a graph database? 7.2 Introducing Neo4j: a graph database 7.2.1 Cypher: a graph query language 7.3 Connected data example: a recipe recommendation engine 7.3.1 Step 1: Setting the research goal 7.3.2 Step 2: Data retrieval 7.3.3 Step 3: Data preparation 7.3.4 Step 4: Data exploration 7.3.5 Step 5: Data modeling 7.3.6 Step 6: Presentation 7.4 Summary 8 Text mining and text analytics 8.1 Text mining in the real world 8.2 Text mining techniques 8.2.1 Bag of words 8.2.2 Stemming and lemmatization 8.2.3 Decision tree classifier 8.3 Case study: Classifying Reddit posts 8.3.1 Meet the Natural Language Toolkit 8.3.2 Data science process overview and step 1: The research goal 8.3.3 Step 2: Data retrieval 8.3.4 Step 3: Data preparation 8.3.5 Step 4: Data exploration 8.3.6 Step 3 revisited: Data preparation adapted 8.3.7 Step 5: Data analysis 8.3.8 Step 6: Presentation and automation 8.4 Summary 9 Data visualization to the end user 9.1 Data visualization options 9.2 Crossfilter, the JavaScript MapReduce library 9.2.1 Setting up everything 9.2.2 Unleashing Crossfilter to filter the medicine data set 9.3 Creating an interactive dashboard with dc.js 9.4 Dashboard development tools 9.5 Summary Appendix A—Setting up Elasticsearch A.1 Linux installation A.2 Windows installation Appendix B—Setting up Neo4j B.1 Linux installation B.2 Windows installation Appendix C—Installing MySQL server C.1 Windows installation C.2 Linux installation Appendix D—Setting up Anaconda with a virtual environment D.1 Linux installation D.2 Windows installation D.3 Setting up the environment index Symbols Numerics A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Back cover Summary Introducing Data Science teaches you how to accomplish the fundamental tasks that occupy data scientists. Using the Python language and common Python libraries, you'll experience firsthand the challenges of dealing with data at scale and gain a solid foundation in data science. Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications. About the Technology Many companies need developers with data science skills to work on projects ranging from social media marketing to machine learning. Discovering what you need to learn to begin a career as a data scientist can seem bewildering. This book is designed to help you get started. About the Book Introducing Data Science Introducing Data Science explains vital data science concepts and teaches you how to accomplish the fundamental tasks that occupy data scientists. Youll explore data visualization, graph databases, the use of NoSQL, and the data science process. Youll use the Python language and common Python libraries as you experience firsthand the challenges of dealing with data at scale. Discover how Python allows you to gain insights from data sets so big that they need to be stored on multiple machines, or from data moving so quickly that no single machine can handle it. This book gives you hands-on experience with the most popular Python data science libraries, Scikit-learn and StatsModels. After reading this book, youll have the solid foundation you need to start a career in data science. Whats Inside Handling large data Introduction to machine learning Using Python to work with data Writing data science algorithms About the Reader This book assumes you're comfortable reading code in Python or a similar language, such as C, Ruby, or JavaScript. No prior experience with data science is required. About the Authors Davy Cielen , Arno D. B. Meysman , and Mohamed Ali are the founders and managing partners of Optimately and Maiton, where they focus on developing data science projects and solutions in various sectors. Table of Contents Data science in a big data world The data science process Machine learning Handling large data on a single computer First steps in big data Join the NoSQL movement The rise of graph databases Text mining and text analytics Data visualization to the end user
دانلود کتاب Introducing Data Science : Big data, machine learning, and more, using Python tools