وبلاگ بلیان

Frontiers in massive data analysis

معرفی کتاب «Frontiers in massive data analysis» نوشتهٔ National Research Council (U.S.). Division on Engineering and Physical Sciences، National Research Council (Estados Unidos)، National Research Council (U.S.). Committee on Applied and Theoretical Statistics.، National Research Council (U.S.). Committee on the Analysis of Massive Data. و National Research Council (U.S.). Board on Mathematical Sciences and Their Applications.، منتشرشده توسط نشر The National Academies Press در سال 2013. این کتاب در فرمت pdf، زبان انگلیسی ارائه شده است. «Frontiers in massive data analysis» در دستهٔ بدون دسته‌بندی قرار دارد.

Data mining of massive data sets is transforming the way we think about crisis response, marketing, entertainment, cybersecurity and national intelligence. Collections of documents, images, videos, and networks are being thought of not merely as bit strings to be stored, indexed, and retrieved, but as potential sources of discovery and knowledge, requiring sophisticated analysis techniques that go far beyond classical indexing and keyword counting, aiming to find relational and semantic interpretations of the phenomena underlying the data. __Frontiers in Massive Data Analysis__ examines the frontier of analyzing massive amounts of data, whether in a static database or streaming through a system. Data at that scale--terabytes and petabytes--is increasingly common in science (e.g., particle physics, remote sensing, genomics), Internet commerce, business analytics, national security, communications, and elsewhere. The tools that work to infer knowledge from data at smaller scales do not necessarily work, or work well, at such massive scale. New tools, skills, and approaches are necessary, and this report identifies many of them, plus promising research directions to explore. __Frontiers in Massive Data Analysis__ discusses pitfalls in trying to infer knowledge from massive data, and it characterizes seven major classes of computation that are common in the analysis of massive data. Overall, this report illustrates the cross-disciplinary knowledge--from computer science, statistics, machine learning, and application disciplines--that must be brought to bear to make useful inferences from massive data. "With information available from Internet sites around the globe and flowing over communication networks connecting billions of devices, today's society has access to an enormous amount of data. Scientific communities and the defense and intelligence enterprise are also generating massive amounts of data from experiments, observations, and numerical simulations. Some Internet-based companies are dealing with data measured in exabytes (a billion billion bytes), and many other sources are producing terabytes or even petabytes of data. While systems have been developed to store and manage such massive amounts of data, some of which streams by and is only examined "on the fly," our ability to infer knowledge from data at this scale is limited. A major challenge is developing statistically well-founded procedures that allow us to control the inevitable errors; many traditional tools of data analysis are not feasible at this scale. Frontiers in Massive Data Analysis descrives the cross-disciplinary skill set that data analysts need to address the challenges of exploiting big data. It identififies gaps in current capabilities and recommends promising research directions in multiple component areas, ranging from data representation to methods for including humans in the data-analysis loop. The report also proposes a list of key computational problems, the "seven computational giants" of massive data analysis"--Back cover Content: Massive data in science, technology, commerce, national defense, telecommunications, and other endeavors -- Scaling the infrastructure for data management -- Temporal data and real-time algorithms -- Large-scale data representations -- Resources, trade-offs, and limitations -- Building models from massive data -- Sampling and massive data -- Human interaction with data -- The seven computational giants of massive data analysis.
دانلود کتاب Frontiers in massive data analysis