ISYS ch 8

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Forecasting model

Timer series info is time stamped info collected at a particular frequency. Forecasts are predictions based on time series info allowing users to manipulate the time series for forecasting activities (ex. web visits per hour)

Data understanding (data mining)

analysts of all current data along with identifying any data quality issues

evaluation (data mining)

analyze the trends and patterns to assess the potential for solving the business problem

social media analysis

analyzes text flowing across the Internet, including unstructured text from blogs and messages

web analysis

analyzes unstructured data associated with websites to identify consumer behavior and website navigation

text analysis

analyzes unstructured data to find trends and patterns in words and sentences

fast data

application of big data analytics to small data sets in near-real or real-time in order to solve a problem or create business value

data modeling (data mining)

apply mathematical techniques to identify trends and patterns in the data

data artist

business analytics specialist who uses visual tools to help people understand complex data

the term fast data is usually associated with?

business intelligence and the goal is to quickly gather and mine structured and unstructured data so that action can be taken

How is classification and cluster analysis different?

classification analysis requires that all classes are defined before the analysis begins

big data

collection of large, complex data sets including structured and unstructured data, which cannot be analyzed using traditional database methods and tools

cube

common term for the representation of multidimensional info

What are the 2 primary computing models that have shaped the collection of big data?

computing and vitalization

data preparation (data mining)

gather and organize the data in the correct formats and structures for analysis

exploratory data analysis

identifies patterns in data including outliers, uncovering the underlying structure to understand relationships between the variables

cluster analysis

technique used to divide an info set into mutually exclusive groups such that the members of each group are as close together as possible to one another and the different groups are as far apart as possible (ex. targeting marketing based on zip codes)

Data mining process model

1. Business Understanding 2. Data Understanding 3. Data Preparation 4. Data Modeling 5. Evaluation 6. Deployment

What does insights extracted from data profiling do?

Can determine how easy or difficult it will be to use existing data for other purposes along with providing metrics on data quality

what are the 3 elements of data mining?

Data, discovery and deployment

Regression model

a statistical process for estimating the relationships among variables, include many techniques for modeling and analyzing several variables when the focus is on the relationship between a dependent and independent variable (ex. predict the winners of a marathon based on gender, weight)

Optimization model

a statistical process that finds the way to make a design system, or decision as effective as possible (ex. choose a combo of projects to max. overall earnings)

Business understanding (data mining)

gain a clear understanding of the business problem that must be solved and how it impacts the company

Recommendation engine

data mining algorithm that analyzes a customer's purchases and actions on a website and then uses the data to recommend complementary products

outlier

data value that is numerically distant from most of the other data points in a set of data; helped identified by anomaly detection

deployment (data mining)

deploy the discoveries to the organization for work in everyday business

data visualization

describes technologies that allow users to see or visualize data to transform info into a business perspective

correlation analysis

determines a statistical relationship between variables, often for the purpose of identifying predictive factors among the variables

estimation analysis

determines values for an unknown continuous variable behavior or estimate future value; one of the least expensive modeling techniques

what does variety of big data mean?

different forms of structured/unstructured data, data from spreadsheets and databases as well as from email, videos, photos and PDFs, all of which must be analyzed

data scientist

extracts knowledge from data by performing statistical analysis, data mining and advanced analytics on big data to identify trends, market changes and other relevant info

big data includes data sources that include

extremely large volumes of data, with high velocity, wide variety and an understanding of the data veracity

Data

foundation for data directed decision making

Data mining can determine relationships among...

internal factors (such as price, product positioning or staff skills) and external factors (economic indicators, competition, and customer demographics)

algorithms

mathematical formulas placed in software the performs an analysis on a data set

With distributed computing individual computers are...

networked together across geographical areas and work together to execute a workload or computing processes as if they were one single computing environment

estimation models predict

numeric outcomes based on historical data

analysis paralysis

occurs when the user goes into an emotional state of over-analysis a situation so that a decision or action is never taken, in effect paralyzing the outcome

market basket analysis

one of the most common forms of association detection analysis; evaluates such items as websites and checkout scanner info to detect customers' buying behavior and predict future behavior by identifying affinities among customers' choices of products and services

3 common data mining techniques for predictions

optimization, forecasting and regression model

infographics

present the results of data analysis, displaying the patterns relationships and trends in a graphical format

discovery

process of identifying new patterns, trends, and insights

deployment

process of implementing discoveries to drive success

Distributed computing

processes and manages algorithms across many machines in a computing environment

Data mining allows users to

recycle their work to become more efficient and effective on solving future problems

affinity grouping analysis

reveals the relationship between variables along with the nature and frequency of the relationship; create rules to determine the likelihood of events occurring together at a particular time or following each other in a logical progression

data visualization tools

sophisticated analysis techniques such as controls, instruments, maps, time-series graphs

prediction

statement about what will happen or might happen in the future

what does velocity of big data mean?

the analysis of streaming data as it travels around the Internet, analysis necessary of social media messages spreading globally

pattern recognition analysis

the classification or labeling of an identified pattern in the machine learning process

virtualization

the creation of a virtual (rather than actual) version of computing resources, such as an operating system, a server, a storage device, or network resources

data mining

the process of analyzing data to extract info not offered by the raw data alone

speech analysis

the process of analyzing recorded calls to gather info; heavily used in customer service

Data profiling

the process of collecting statistics and info about data in an existing source

anomaly detection

the process of identifying rare or unexpected items or events in a data set that do not conform to other items in the data set

classification analysis

the process of organizing data into categories or groups for its most effective and efficient use

data replication

the process of sharing info to ensure consistency between multiple data sources

what does volume of big data mean?

the scale of data; includes enormous volumes of data generated daily; massive volume created by machines and networks; big data tools necessary to analyze zettabytes and brontobytes

analytics

the science of fact-based decision making; use software based algorithms and stats to derive meaning from data

what does veracity of big data mean?

the uncertainty of data, including biases, noise and abnormalities; uncertainty or untrustworthiness of data; data must be meaningful to the problem being analyzed; must keep data clean and implement processes to keep dirty data from accumulating in systems

Why do companies use data mining techniques?

to compile a complete picture of their operations, all within a single view, allowing them to identify trends and improve forecasts

business intelligence dashboards

track corporate metrics such as critical success factors and key performance indicators and include advanced capabilities such an interactive controls, allowing users to manipulate data for analysis

behavioral analysis

using data about people's behaviors to understand intent and predict future actions

data mining tools

variety of techniques to find patterns and relationships in large volumes of info that predict future behavior and guide decision making

What are the 4 common characteristics of big data?

variety, veracity, volume and velocity


Ensembles d'études connexes

Intro Supply Chain Management Chapter 3: Creating and Managing Supplier Relationships

View Set

intermediate accounting ifrs, chapter 13, conceptual multiple choice

View Set

PHYS 2001: PLV questions (Final)

View Set

Bacteriostatic Inhibitors of Protein Synthesis

View Set

French and Indian War Brainpop Quiz

View Set

AP Bio cell cycle, control, and DNA practice MC questions

View Set

Practice quiz Unit 2 Endocrine System BIO 122

View Set

Managerial Economics - Chapter 7 - Economies of Scale and Scope

View Set