وبلاگ بلیان

Mastering Java Machine Learning: A Java developer's guide to implementing machine learning and big data architectures

معرفی کتاب «Mastering Java Machine Learning: A Java developer's guide to implementing machine learning and big data architectures» نوشتهٔ Dr. Uday Kamath; Krishna Choppella، منتشرشده توسط نشر Packt Publishing - ebooks Account در سال 2017. این کتاب در 5 صفحه، فرمت pdf، زبان انگلیسی ارائه شده است. «Mastering Java Machine Learning: A Java developer's guide to implementing machine learning and big data architectures» در دستهٔ بدون دسته‌بندی قرار دارد.

Cover......Page 1 Copyright......Page 3 Credits......Page 5 Foreword......Page 7 About the Authors......Page 9 About the Reviewers......Page 10 www.PacktPub.com......Page 12 Customer Feedback......Page 13 Table of Contents......Page 17 Preface......Page 29 Chapter 1: Machine Learning Review......Page 39 Machine learning – history and definition......Page 41 What is not machine learning?......Page 42 Machine learning – concepts and terminology......Page 43 Machine learning – types and subtypes......Page 47 Datasets used in machine learning......Page 50 Machine learning applications......Page 53 Practical issues in machine learning......Page 54 Process......Page 56 Machine learning – tools and datasets......Page 60 Datasets......Page 63 Summary......Page 64 Chapter 2: Practical Approach to Real-World Supervised Learning......Page 67 Formal description and notation......Page 69 Basic label analysis......Page 70 Univariate feature analysis......Page 71 Data transformation and preprocessing......Page 72 Handling missing values......Page 73 Outliers......Page 75 Discretization......Page 76 Is sampling needed?......Page 77 Undersampling and oversampling......Page 78 Training, validation, and test set......Page 79 Filter approach......Page 84 Embedded approach......Page 89 Linear Regression......Page 90 Naïve Bayes......Page 92 Logistic Regression......Page 93 Decision Trees......Page 94 K-Nearest Neighbors (KNN)......Page 97 Support vector machines (SVM)......Page 99 Ensemble learning and meta learners......Page 103 Bootstrap aggregating or bagging......Page 104 Boosting......Page 105 Model assessment......Page 106 Model evaluation metrics......Page 107 Confusion matrix and related metrics......Page 108 ROC and PRC curves......Page 109 Model comparisons......Page 110 Comparing two algorithms......Page 111 Comparing multiple algorithms......Page 113 Case Study – Horse Colic Classification......Page 114 Machine learning mapping......Page 115 Features analysis......Page 116 Weka experiments......Page 118 RapidMiner experiments......Page 121 Results, observations, and analysis......Page 130 Summary......Page 131 References......Page 133 Chapter 3: Unsupervised Machine Learning Techniques......Page 135 Issues in common with supervised learning......Page 136 Notation......Page 137 Principal component analysis (PCA)......Page 138 Random projections (RP)......Page 141 Multidimensional Scaling (MDS)......Page 142 Kernel Principal Component Analysis (KPCA)......Page 143 Manifold learning......Page 144 k-Means......Page 146 DBSCAN......Page 147 Mean shift......Page 148 Expectation maximization (EM) or Gaussian mixture modeling (GMM)......Page 150 Hierarchical clustering......Page 151 Self-organizing maps (SOM)......Page 153 Spectral clustering......Page 154 Affinity propagation......Page 156 Internal evaluation measures......Page 158 External evaluation measures......Page 160 Outlier algorithms......Page 161 Statistical-based......Page 162 Distance-based methods......Page 163 Density-based methods......Page 164 Clustering-based methods......Page 167 High-dimensional-based methods......Page 168 One-class SVM......Page 170 Supervised evaluation......Page 172 Tools and software......Page 173 Data quality analysis......Page 174 Data sampling and transformation......Page 175 Feature analysis and dimensionality reduction......Page 176 Clustering models, results, and evaluation......Page 181 Observations and clustering analysis......Page 183 Outlier models, results, and evaluation......Page 184 Summary......Page 186 References......Page 187 Chapter 4: Semi-Supervised and Active Learning......Page 191 Semi-supervised learning......Page 193 Representation, notation, and assumptions......Page 194 Self-training SSL......Page 195 Co-training SSL or multi-view SSL......Page 196 Cluster and label SSL......Page 197 Transductive graph label propagation......Page 199 Transductive SVM (TSVM)......Page 201 Tools and software......Page 203 Business problem......Page 205 Datasets and analysis......Page 206 Experiments and results......Page 208 Active learning......Page 210 Active learning approaches......Page 211 Uncertainty sampling......Page 212 Query by disagreement (QBD)......Page 213 Advantages and limitations......Page 215 How does it work?......Page 216 Advantages and limitations......Page 217 Data Collection......Page 218 Feature analysis and dimensionality reduction......Page 219 Models, results, and evaluation......Page 220 Pool-based scenarios......Page 221 Stream-based scenarios......Page 222 Analysis of active learning results......Page 224 Summary......Page 225 References......Page 226 Chapter 5: Real-Time Stream Machine Learning......Page 229 Assumptions and mathematical notations......Page 230 Basic stream processing and computational techniques......Page 231 Stream computations......Page 232 Sliding windows......Page 233 Sampling......Page 234 Concept drift and drift detection......Page 235 Partial memory......Page 236 Detection methods......Page 237 Adaptation methods......Page 239 Linear algorithms......Page 240 Non-linear algorithms......Page 243 Ensemble algorithms......Page 245 Model validation techniques......Page 249 Incremental unsupervised learning using clustering......Page 252 Partition based......Page 253 Hierarchical based and micro clustering......Page 254 Density based......Page 258 Grid based......Page 260 Validation and evaluation techniques......Page 262 Inputs and outputs......Page 267 Distance-based clustering for outlier detection......Page 268 How does it work?......Page 269 Tools and software......Page 273 Data sampling and transformation......Page 276 Models, results, and evaluation......Page 277 Supervised learning experiments......Page 278 Clustering experiments......Page 280 Outlier detection experiments......Page 281 Analysis of stream learning results......Page 284 Summary......Page 286 References......Page 287 Chapter 6: Probabilistic Graph Modeling......Page 289 Chain rule and Bayes' theorem......Page 290 Random variables, joint, and marginal distributions......Page 291 Marginal independence and conditional independence......Page 292 Distribution queries......Page 293 Graph concepts......Page 294 Graph structure and properties......Page 295 Bayesian networks......Page 296 Reasoning patterns......Page 298 Independencies, flow of influence, D-Separation, I-Map......Page 300 Inference......Page 301 Elimination-based inference......Page 302 Propagation-based techniques......Page 309 Sampling-based techniques......Page 313 Learning......Page 314 Learning parameters......Page 316 Learning structures......Page 321 Parameterization......Page 326 Independencies......Page 327 Learning......Page 329 Conditional random fields......Page 330 Tree augmented network......Page 331 Markov chains......Page 332 Hidden Markov models......Page 334 Most probable path in HMM......Page 335 Posterior decoding in HMM......Page 336 Tools and usage......Page 337 OpenMarkov......Page 338 Weka Bayesian Network GUI......Page 340 Machine learning mapping......Page 341 Feature analysis......Page 342 Models, results, and evaluation......Page 344 Analysis of results......Page 346 Summary......Page 347 References......Page 348 Chapter 7: Deep Learning......Page 351 Inputs, neurons, activation function, and mathematical notation......Page 352 Structure and mathematical notations......Page 353 Activation functions in NN......Page 354 Training neural network......Page 355 Vanishing gradients, local optimum, and slow training......Page 362 Rectified linear activation function......Page 364 Restricted Boltzmann Machines......Page 365 Autoencoders......Page 370 Unsupervised pre-training and supervised fine-tuning......Page 374 Deep feed-forward NN......Page 375 Deep Autoencoders......Page 377 Deep Belief Networks......Page 378 Deep learning with dropouts......Page 380 Sparse coding......Page 382 Convolutional Neural Network......Page 383 CNN Layers......Page 391 Recurrent Neural Networks......Page 394 Tools and software......Page 401 Feature analysis......Page 402 Basic data handling......Page 403 Multi-layer perceptron......Page 404 Convolutional Network......Page 407 Variational Autoencoder......Page 409 DBN......Page 411 Parameter search using Arbiter......Page 412 Results and analysis......Page 413 Summary......Page 414 References......Page 415 Chapter 8: Text Mining and Natural Language Processing......Page 419 NLP, subfields, and tasks......Page 421 Text clustering......Page 422 Information extraction and named entity recognition......Page 423 Word sense disambiguation......Page 424 Automating question and answers......Page 425 Text processing components and transformations......Page 426 How does it work?......Page 427 Inputs and outputs......Page 428 Stemming or lemmatization......Page 429 Local/global dictionary or vocabulary......Page 430 Lexical features......Page 431 Syntactic features......Page 432 Vector space model......Page 434 Similarity measures......Page 437 Feature selection and dimensionality reduction......Page 438 Feature selection......Page 439 Dimensionality reduction......Page 440 Text categorization/classification......Page 441 Probabilistic latent semantic analysis (PLSA)......Page 443 Clustering techniques......Page 447 Evaluation of text clustering......Page 453 Hidden Markov models for NER......Page 454 Maximum entropy Markov models for NER......Page 456 Deep learning and NLP......Page 458 Mallet......Page 462 KNIME......Page 463 Topic modeling with mallet......Page 464 Machine Learning mapping......Page 465 Data sampling and transformation......Page 466 Feature analysis and dimensionality reduction......Page 468 Models, results, and evaluation......Page 469 Summary......Page 470 References......Page 471 Chapter 9: Big Data Machine Learning – The Final Frontier......Page 475 What are the characteristics of Big Data?......Page 477 General Big Data framework......Page 478 Big Data cluster deployment frameworks......Page 479 Data acquisition......Page 483 Data storage......Page 484 Data processing and preparation......Page 487 Visualization and analysis......Page 488 H2O as Big Data Machine Learning platform......Page 489 H2O architecture......Page 490 Tools and usage......Page 492 Business problem......Page 497 Experiments, results, and analysis......Page 498 Spark architecture......Page 501 Machine Learning in MLlib......Page 504 Experiments, results, and analysis......Page 505 Real-time Big Data Machine Learning......Page 510 SAMOA as a real-time Big Data Machine Learning framework......Page 512 Machine Learning algorithms......Page 514 Tools and usage......Page 515 The future of Machine Learning......Page 516 Summary......Page 518 References......Page 519 Vector......Page 521 Transpose of a matrix......Page 522 Matrix multiplication......Page 523 Singular value decomposition (SVD)......Page 526 Bayes' theorem......Page 529 Mean......Page 530 Standard deviation......Page 531 Covariance......Page 532 Binomial distribution......Page 533 Gaussian distribution......Page 534 Error propagation......Page 535 Index......Page 537 Become an advanced practitioner with this progressive set of master classes on application-oriented machine learning About This Book Comprehensive coverage of key topics in machine learning with an emphasis on both the theoretical and practical aspects More than 15 open source Java tools in a wide range of techniques, with code and practical usage. More than 10 real-world case studies in machine learning highlighting techniques ranging from data ingestion up to analyzing the results of experiments, all preparing the user for the practical, real-world use of tools and data analysis. Who This Book Is For This book will appeal to anyone with a serious interest in topics in Data Science or those already working in related areas: ideally, intermediate-level data analysts and data scientists with experience in Java. Preferably, you will have experience with the fundamentals of machine learning and now have a desire to explore the area further, are up to grappling with the mathematical complexities of its algorithms, and you wish to learn the complete ins and outs of practical machine learning. What You Will Learn Master key Java machine learning libraries, and what kind of problem each can solve, with theory and practical guidance. Explore powerful techniques in each major category of machine learning such as classification, clustering, anomaly detection, graph modeling, and text mining. Apply machine learning to real-world data with methodologies, processes, applications, and analysis. Techniques and experiments developed around the latest specializations in machine learning, such as deep learning, stream data mining, and active and semi-supervised learning. Build high-performing, real-time, adaptive predictive models for batch- and stream-based big data learning using the latest tools and methodologies. Get a deeper understanding of technologies leading towards a more powerful AI applicable in various domains such as Security, Financial Crime, Internet of Things, social networking, and so on. In Detail Java is one of the main languages used by practicing data scientists; much of the Hadoop ecosystem is Java-based, and it is certainly the language that most production systems in Data Science are written in. If you know Java, Mastering Machine Learning with Java is your next step on the path to becoming an advanced practitioner in Data Science. This book aims to introduce you to an array of advanced techniques in machine learning, including classifi.. A general framework for constructing and using probabilistic models of complex systems that would enable a computer to use available information for making decisions. Most tasks require a person or an automated system to reasonto reach conclusions based on available information. The framework of probabilistic graphical models, presented in this book, provides a general approach for this task. The approach is model-based, allowing interpretable models to be constructed and then manipulated by reasoning algorithms. These models can also be learned automatically from data, allowing the approach to be used in cases where manually constructing a model is difficult or even impossible. Because uncertainty is an inescapable aspect of most real-world applications, the book focuses on probabilistic models, which make the uncertainty explicit and provide models that are more faithful to reality. Probabilistic Graphical Models discusses a variety of models, spanning Bayesian networks, undirected Markov networks, discrete and continuous models, and extensions to deal with dynamical systems and relational data. For each class of models, the text describes the three fundamental cornerstones: representation, inference, and learning, presenting both basic concepts and advanced techniques. Finally, the book considers the use of the proposed framework for causal reasoning and decision making under uncertainty. The main text in each chapter provides the detailed technical development of the key ideas. Most chapters also include boxes with additional material: skill boxes, which describe techniques; case study boxes, which discuss empirical cases related to the approach described in the text, including applications in computer vision, robotics, natural language understanding, and computational biology; and concept boxes, which present significant concepts drawn from the material in the chapter. Instructors (and readers) can group chapters in various combinations, from core topics to more technically advanced material, to suit their particular needs. "Since the beginning of the Internet age and the increased use of ubiquitous computing devices, the large volume and continuous flow of distributed data have imposed new constraints on the design of learning algorithms. Exploring how to extract knowledge structures from evolving and time-changing data, Knowledge Discovery from Data Streams presents a coherent overview of state-of-the-art research in learning from data streams. The book covers the fundamentals that are imperative to understanding data streams and describes important applications, such as TCP/IP traffic, GPS data, sensor networks, and customer click streams. It also addresses several challenges of data mining in the future, when stream mining will be at the core of many applications. These challenges involve designing useful and efficient data mining solutions applicable to real-world problems. In the appendix, the author includes examples of publicly available software and online data sets. This practical, up-to-date book focuses on the new requirements of the next generation of data mining. Although the concepts presented in the text are mainly about data streams, they also are valid for different areas of machine learning and data mining."--Publisher's description Become an advanced practitioner with this progressive set of master classes on application-oriented machine learning Java is one of the main languages used by practicing data scientists; much of the Hadoop ecosystem is Java-based, and it is certainly the language that most production systems in Data Science are written in. If you know Java, Mastering Machine Learning with Java is your next step on the path to becoming an advanced practitioner in Data Science. This book aims to introduce you to an array of advanced techniques in machine learning, including classification, clustering, anomaly detection, stream learning, active learning, semi-supervised learning, probabilistic graph modeling, text mining, deep learning, and big data batch and stream machine learning. Accompanying each chapter are illustrative examples and real-world case studies that show how to apply the newly learned techniques using sound methodologies and the best Java-based tools available today. On completing this book, you will have an understanding of the tools and techniques for building powerful machine learning models to solve data science problems in just about any domain. 1. Introduction -- 2. Foundations -- I. Representation -- 3. Bayesian Network Representation -- 4. Undirected Graphical Models -- 5. Local Probabilistic Models -- 6. Template-based Representations -- 7. Gaussian Network Models -- 8. Exponential Family -- Ii. Inference -- 9. Exact Inference: Variable Elimination -- 10. Exact Inference: Clique Trees -- 11. Inference As Optimization -- 12. Particle-based Approximate Inference -- 13. Map Inference -- 14. Inference In Hybrid Networks -- 15. Inference In Temporal Models -- Iii. Learning -- 16. Learning Graphical Models: Overview -- 17. Parameter Estimation -- 18. Structure Learning In Bayesian Networks -- 19. Partially Observed Data -- 20. Learning Undirected Models -- Iv. Actions And Decisions -- 21. Causality -- 22. Utilities And Decisions -- 23. Structured Decision Problems -- 24. Epilogue -- A. Background Material. Daphne Koller And Nir Friedman. Includes Bibliographical References (p. [1171]-1207) And Indexes. Proceedings of the annual Conference on Uncertainty in Artificial Intelligence, available for 1991-present. Since 1985, the Conference on Uncertainty in Artificial Intelligence (UAI) has been the primary international forum for exchanging results on the use of principled uncertain-reasoning methods in intelligent systems. The UAI Proceedings have become a basic reference for researches and practitioners who want to know about both theoretical advances and the latest applied developments in the field
دانلود کتاب Mastering Java Machine Learning: A Java developer's guide to implementing machine learning and big data architectures