Data Warehouse
Four common characteristics of big data 4V's
Variety Veracity Volume Velocity
Ineffective Direct Data Access
most data stored in operational databases did not allow users direct access, users had to wait to have their queries or questions answered by MIS professionals who code SQL
poor data quality
the data, if available, were often incorrect or incomplete. so users could not rely on the data to make decisions
Data-mining tools
use a variety of techniques to find patterns and relationships in large volumes of information
inadequate data usefulness
users could not get the data they needed, what was collected was not always useful for intended purposes
Big Data
A collection of large, complex data sets, including structured and unstructured data, which cannot be analyzed using traditional database methods and tools and includes the following four common characteristics Variety Veracity Volume Velocity
Extraction, transformation, and loading (ETL)
A process that extracts information from internal and external databases, transforms the information using a common set of enterprise definitions, and loads the information into a data warehouse
Data warehouse
A logical collection of information - gathered from many different operational databases - that supports business analysis activities and decision-making tasks
data bases are 2D - rows (entities) and columns (attributes) data warehouse is multidimensional, layers of rows & columns Dimension
A particular attribute of information
Data Mining Techniques CEAC
Classification Estimation Affinity grouping Clustering
Cube
Common term for the representation of multidimensional information
Structured Data
Contains a defined length, type, and format and includes numbers, dates, or strings machine-generated or human-generated
Data mart
Contains a subset of data warehouse information
Three organizational methods for analyzing big data
Data mining Big data analytics Data visualization
Data Visualization
Infographics Analysis paralysis Data visualization Data visualization tools Business intelligence dashboards Data artist
Unstructured data
Not defined, does not follow a specified format, and is typically freeform text such as emails, Twitter tweets, text messages
Data mining analysis methods POFR
Prediction Optimization Forecasting Regression
Data mining
The process of analyzing data to extract information not offered by the raw data alone
primary purpose of a data warehouse
aggregate information throughout an organization into a single repository for decision-making purposes
inconsistent data definitions
every department had its own method for recording data, so when trying to share info, data did not match and users did not get the data they really needed
Data Warehousing fixes the following problems
inconsistent data definitions lack of data standards poor data quality inadequate data usefulness ineffective direct data access
lack of data standards
managers need to perform cross-functional analysis using data from all departments, which differed in granularities, formats and levels