lecture 8- big data
DATA WARE HOUSE vs. DATA LAKE
data lake -native format of data - no transformation data warehouse - complete ETL infrastructure in place - data transformed and integrated for analysis
big data
-Heterogeneous -From various sources such as smart devices, social media, sensors, etc. -Variety of formats such as web-logs, e-mails, tweets, text, video, audio, etc.
Data Lake
-Large data pool in which the schema and data requirements are not defined until the data is queried -serves as a corporate big data repository
big data
-Not modeled up front for a pre-determined operational and/or analytical queries (retrievals) -Can encompass 80%-90% (or even more) of stored data -Some of it may be of use and some (actually much) of it may not be of use -Companies/organizations tend to store a lot of it knowing that some of it may be of use later
Three Types of Data stored in Corporations and Organizations
-Transactional Structured Data -Analytical Structured Data -Unstructured/Semi-structured, Un-modeled Data
Map Reduce
-a common big-data technique -Parallel computing divides complex tasks into a sequence of smaller tasks that are performed in parallel on multiple computers
Analytical Structured Data
Data Warehouses and Data Marts data modeled/structured and stored for anticipated pre-determined operational use)
Big Data
Massive volumes of diverse and rapidly growing data that are not formally modeled
Transactional Structured Data
Operational Databases (data modeled/structured and stored for anticipated pre-determined operational use)
true or false Big data methods replace database and data warehousing approaches developed for managing and utilizing formally modeled data assets
false