CIS UNIT 3 MO10A
information scrubbing information cleansing
2 terms that describe process for weeding out, fixing, or discarding inconsistent, incorrect, or incomplete information
Extraction, transformation, and Loading (ETL)
Process that extracts information from internal and external databases, transforms it using a common set of enterprise definitions, and loads it into a data warehouse
cluster analysis
a technique used to divide information sets into mutually exclusive groups such that the members of each group are as close together as possible to one another and the different groups are as far apart as possible
data rich, information poor
accurately defines the problem with too much information
Velocity (big data characteristics)
analysis of streaming data as it travels around the internet; analysis necessary of social media messages spreading globally
market basket analysis
analyzes such items as websites and checkout scanner information to detect customers' buying behavior and predict future behavior by identifying affinities among customers' choices of products and services
social media analytics
analyzes text flowing across internet, including unstructured text from blogs and messages
web analytics
analyzes unstructured data associated with websites to identify consumer behavior and website navigation
text analytics
analyzes unstructured data to find trends and patterns in words and sentences
classification (data mining activity)
assigns records to one of a predefined set of classes
data artist
business analytics specialist who uses visual tools to help people understand complex data
big data
collection of large, complex data sets, including structured and unstructured data, which cannot be analyzed using traditional databases methods and tools
information cube
common term for the representation of multidimensional information
data mart
contains a subset of data warehouse information
data visualization
describes technologies that allow users to "see" or visualize data to transform information into a business perspective
estimation (data mining activity)
determines values for an unknown continuous variable behavior or estimated future value
affinity grouping (data mining activity)
determines which things go together
variety (big data characteristics)
different forms of structured and unstructured data; data from spreadsheets and databases as well as from email, videos, photos, and PDFs, all of which must be analyzed
dirty data
erroneous or flawed data
advanced analytics
focuses on forecasting future trends and producing insights using sophisticated quantitative methods, including statistics, descriptive and predictive data mining, simulation, and optimization
true
gender, for instance can be referred to in many ways (Male ,female, M/F, 1/0) but it should be standardized on a data warehouse with one common way of referring to each data element that stores gender (M/F)
data warehouse
logical collection of information, gathered from many different operational databases, that supports business analysis activities and decision-making tasks
data quality audits
many firms complete this to determine the accuracy and completeness of its data
data visualization tools
move beyond Excel graphs and charts into sophisticated analysis techniques such as pie charts, controls, instruments, maps, time-series graphs, and more
forecasts
prediction based on time series information
infographics
present the results of data analysis, displaying the patterns, relationships, and trends in a graphical format
duplicate data non-formatted data incorrect data
problems associated with dirty data
data mining
process of analyzing data to extract information not offered by the raw data alone
speech analytics
process of analyzing recorded calls to gather information; brings structure to customer interactions and exposes information buried in customer contact center interactions with an enterprise
distributed computing
processes and manages algorithms across many machines in a computing environment
inconsistent data definitions lack of data standards poor data quality inadequate data usefulness ineffective direct data access
reasons why business analysis is difficult using operational databases
association detection
reveals the relationship between variables along with the nature and frequency of the relationships
volume(big data characteristics)
scale of data; includes enormous volumes of data generated daily; Massive volume created by machines and networks
clustering (data mining activity)
segments a heterogeneous population of records into a number of more homogeneous subgroups
business intelligence
solution to the problem of being data rich and information poor
veracity (big data characteristics)
the uncertainty of data, including biases, noise, and abnormalities; uncertainty or untrustworthiness of data; data must be meaningful to the problem being analyze
data mining information cleansing, or scrubbing data mart
three core concepts of data warehousing
data mining big data analytics data visualization
three organizational methods for analyzing big data
time-series information
timestamped information collected at a particular frequency
business intelligence dashboards
tracks corporate metrics such as critical success factors and key performance indicators and include advanced capabilities such as interactive controls allowing users to manipulate data for analysis
structured data
what has a defined length, type, and format and includes numbers, dates, or strings such as Customer Address?
unstructured data
what is not defined, does not follow a specified format, and is typically free form text such as emails?
analysis paralysis
what occurs when user goes into an emotional state of over-analysis (or over thinking) a situation so that a decision or action is never taken, in effect paralyzing the outcome. in the time of big data, analysis paralysis is a growing problem?
data mining tools
what uses a variety of techniques to find patterns and relationships in large volumes of information that predict future behavior and guide decision making?
where has the business been where is the business now where is the business going
which of the following are answers to tough business questions BI can answer?
data scientists
who extracts knowledge from data by performing statistical analysis, data mining, and advanced analytics on big data to identify trends, market changes, and other relevant information