ISTM Chapter 17
transform
Once you've extracted data, it needs to become normalized. Data is no good to you unless it's organized. Normalizing data means that your data is typically organized into the fields and records of a rela7onal database
ERP
enterprise resource planning
What refers to how fast data is collected?
velocity
What refers to quality of data?
veracity
What refers to the amount of data?
volume
four v's
volume, velocity, variety, veracity
data mining
or data discovery, examination of huge sets of data to find patterns and connections and identify outliers
text analytics
or text mining, hints through unstructured data to look for useful patterns
What decade did businesses start DSS?
1950's
How much data is unstructured?
80%
What refers to an assortment of software applications to analyze an organization's raw data?
BI
What allows a cluster system that allows data to be stored on multiple servers?
Hadoop
Hadoop
an infrastructure for storing and processing large sets of data across mul7ple servers. uses a cluster system that allows files to be stored on mul7ple servers open-source soVware framework wriXen in Java for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware.
dashboards
are easy to use graphical interfaces that characterize specific data analysis through visualization, make it a lot easier to make sense of data and see the resulting information
predictive analytics
attempts to reveal future patterns in a marketplace, essentially trying to predict the future by looking for data cirrelations
Large collected datasets are called what?
big data
decision analytics
builds on predictive analysis to make decisions about future industries and marketplaces
semi-structured
can be converted into structured data easily
BI can be described as
computer apps that change data into significant, meaningful info that helps orgs make better decisions the set of techniques useful info for business analysis purposes
DSS- decision support systems
computer based systems that support an orgs decision making activities ex: loan officers at a bank use dis to verify the credit of a loan applicant
In BI, what does the acronym CRM mean?
customer relationship management
CRM
customer relationship management component of era system used to track and organize communication with customers SIEBEL holds info about sales, marketing, customer service records and much more
What is a graphical interface that characterizes specific data analysis through visualization?
dashboard
What software helps BI become more usable through visualization?
dashboards
What is another term for Data Mining?
data discovery
analyzing big data
data mining, or data discovery is the examina7on of huge sets of data to find paXerns and connec7ons, and iden7fy outliers.
What consolidates disparate data?
data warehouse
storing and managing big data
data warehouses are used to consolidate disparate data in a central location yottabytes
In relation to BI, what does the acronym DSS mean?
decision support system
unstructured data
disorganized data that cannot be easily read or processed by a computer bc it is not stored in rows and columns like traditional data tables almost 80%
ETL
extract, transform, load
data visualization
graphic display of the results of data mining, analytics and bi in general, typically in real time
Data Visualiza7on soVware
helps BI program results become more understandable and therefore, more meaningful in decision making.
structured data
in fixed formats, well labeled and often traditional fields and records of common data tables
What held up the emergence of the Cloud?
internet speeds
Descrptive analytics
is the baseline that other types of analy7cs are built. Descrip7ve Analy7cs define past data you already have that can be grouped into significant pieces like a department's sales results, and also start to reveal trends.
veracity
is the data any good? do you trust the data, cleaning and scrubbing
Hadoop is written in
java
data warehouse
like oracle, IBM, SAS collection of data from a variety of sources used to support DSS and generate BI -sams compared to Walmart or like slice and dice drill to find the data you need holds Yottabytes of data
datamart is a smaller, more focused data warehouse and they...
limit the complexity of databases so you can't answer as much as with a data warehouse, but they are cheaper to implement than a full warehouse
What provides a standard format to organize data?
normalization
extract
once you've determined where your data resides you can start extracting it often from CRM or ERP collection of the data
What attempts to reveal future patterns in the marketplace?
predictive analysis
Map reduce
processing arm or engine of Hadoop allows data to be queried and processed directly on the server where it lives, instead of moving data across the network to be analyzed on the computer
BI or Business Intelligence
refers to an assortment of software apps used to analyze an organizations raw data
who benefits from big data
retail-CRMS, optimize storage and layout at stores financial services- risk analysis, fraud detection advertising-target the ads govt- security, counter terrisom energy-smart grid healthcare- risk factors, genome research
volume
sheer quantity, refers to the amount of data collected by an organization
data mart
smaller than warehouse, limit the complexity of databases so you can't answer as much ad a data warehouse
Text Analytics
some7mes called Text-mining hunts through unstructured text data to look for useful paXerns, like whether their customers on Facebook.com or Instagram.com are unsa7sfied with the organiza7on's products or service.
velocity
speed it took to gather and process data
What kind of data resides in fixed formats?
structured
What is another term for Text Analytics?
text mining
in map reduce only ...
the query is transported through the network
load
transferred into the data warehouse or datamart. Loading some7mes happens weekly, daily, or even hourly.
In BI, which of the following is not part of ETL?
transmission
topic analytics
tries to catalog phrases of an organizations customer feedback into relevant topics
What refers to different kinds of data?
variety
variety
you know what data you wish to collect but is it structured, unstructured or semi structured structured: customer rating 1-5, business transactions semi-structured: not structured but have tags ( XBRL- business lang that uses tape or emails- to from and subject unstructured- call centered service or customer complaints