Lecture 8- Big Data

¡Supera tus tareas y exámenes ahora con Quizwiz!

Data Lake

-large data pool in which the schema and data requirements are not defined until the data is queried -While a data warehouse stores data in predefined target structure with detailed metadata, a data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed -serves as a corporate big data repository -a suitable concept for the storage of big data, as big data is inherently less structured and typically kept in its raw format -big data analysis can be performed on data in data lakes or directly in original sources

Big Data- DWH vs. Data Lake

Analogy: Agriculture vs. Hunting/Gathering We can think of a data warehouse as an area of land devoted to agriculture We can think of a data lake as a wild area of an equivalent size, which is simply being fenced and designated for hunting and gathering -The preparation was much simpler and cheaper, but the yield is smaller, less predictable, and different in nature A fully developed data warehouse and a data lake are at the opposite ends of a broad spectrum of solutions for large analytical data repositories

Big Data Techniques

MapReduce is a common big-data technique -parallel computing divides complex tasks into a sequence of smaller tasks that are performed in parallel on multiple computers -using multiple computers at the same time (parallel computing) vastly reduces the time needed for processing -MapReduce technique utilizes regular commodity (i.e. cheap) computers

Characteristics of Big Data

Massive volumes of diverse and rapidly growing data that are not formally modeled -Characteristics of Big Data Volume Velocity Variety Heterogeneous -From various sources such as smart devices, social media, sensors etc. -Variety of formats such as: -Semi-structured: web-logs, emails, tweets, etc. -Unstructured: text, video, audio, etc. -Not modeled up from for a pre-determined operational and/or analytical queries (retrievals) -Can encompass 80-90% (or even more) Some of it may be of use and some (actually most) of it will not

Big Data Methods

Standard database and data warehousing techniques, cannot adequately deal with the diversity and volume of big data -big data methods allow organizations to analyze and get insight from the big data -big data methods do not replace database and data warehousing approaches developed for managing and utilizing formally modeled data assets -Instead, they allow organizations to analyze and get insight from the kinds of data that are not suited for regular database and data warehouse techniques

Three Types of Data stored in Corporations and Organizations

Transactional Structured Data- Operational databases- data modeled/structured and stored for anticipated pre-determined operational use Analytical Structured Data- Data Warehouses and Data Marts- Data modeled/structured and stored for anticipated pre-determines analytical use Unstructured/Semi-structured, Un-modeled Data Big Data

Big Data

a part of overall data strategy -not a separate isolated initiative Big Data as a term


Conjuntos de estudio relacionados

2020-21 HCS 6th Grade Unit 4C: Latin American Revolutions

View Set

Chapter 11: Major Minerals and Bone Health (Wiley Questions)

View Set

ACC 216 Chapter Seven (final exam)

View Set

Chapter 26: "The Futile Search For Stability: Europe Between the Wars, 1919-1939"

View Set

American Government Mid Term Study Guide

View Set

ECON CHAPTERS 16,17,11,12,13,15,14,18

View Set

Viral Hepatitis, Cirrhosis NUR210

View Set

Chp. 22 Genomics I: Analysis of DNA

View Set

AH4 - Week 2 - Lippincott 11ed - Ch. 3 Test 4 - The Client with Acute Respiratory Distress Syndrome (ARDS) - Exam 1

View Set