Final Exam (Chapters 17, 18)

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Data engineer

- Build the technological infrastructure and architecture for gathering, growing, and storing raw data - data analysts, data scientists, and statistics depend on the work of data engineers to have access to data

Data analyst

- collects, manipulates, and analyzes data across a business - understands the business processes and technical aspects of an organization - less scientifically focused that a data scientist or statistician but more coding oriented than a business analyst - this role requires both technical and interpersonal skills

Reinforcement learning

- focuses on decision making by rewarding desired behaviors and/or punishing undesired behaviors - decisions are made sequentially by the AI, and each decision is rewarded or punished to train the AI - this hybrid approach may require human intervention

Prescriptive analytics

- identifies what we should do - requires advanced programming - consists of advanced, cutting edge technology and design - uses a decision-making protocol and historical data to train a program on what to do in a real-time situation

Statistician

- provides a methodology for drawing conclusions from data - the role is math oriented and focuses on collection and interpretation of quantitative data using defined scientific methods

Diagnostic analytics

- provides insight into why something happened - drills down to a granular level - often looks at data in a variety of ways to identify trends or investigate causes

Descriptive analytics

- tells us what has happened - looks at historical data and condenses it into smaller, more meaningful bits of information - it is the most common form of analysis - easy to access, visualize, and is based off historical data

Unsupervised learning

- uses algorithms to analyze unlabeled data sets for hidden patterns - it does not require human intervention

Predictive analytics

- uses statistical modeling and algorithms to predict what is likely to happen - provides powerful tools that assist in decision making and inform future actions - requires assumptions and identifies possible outcomes based off these assumptions

Supervised learning

- usually labeled data sets to train the algorithm to classify data or predict outcomes from a data set - before it can begin, human intervention is necessary to establish the labels within the data set

Business analyst or business intelligence analyst

- works with data to find trends and leverage insights - process a deep understanding of business processes and can evaluate them, analyze key metrics, and provide strategic recommendations - must access requirements from a business perspective and understand data visualization - communication skills are important, people in this role must effectively present information and actionable insights to business leaders

Data scientist

- works with larger volume of data - designs and programs algorithms to collect and analyze data and perform predictive analysis - process technical skills such as understanding of higher-level math and proficiency in coding

Time trend:

A consistent movement that does not repeat

Seasonality:

A consistent movement that repeats on a regular basis

Noise:

An additional movement that cannot be explained as a trend or seasonality

Natural language processing (NLP):

An advanced type of textual analysis that uses artificial intelligence to read, understand, and derive meaning from human language ; a chat bot that helps with virtual communication with customers

____ is also known as outlier analysis, reveals observations or events that are outside a data's set normal behavior; it is an important data analytic objective that can be the purpose of many of the data analytics techniques

Anomaly detection

_____ reveals observations or events that are outside of a data set's normal behavior

Anomaly detection

____ is commonly used predictive and prescriptive analytics to train the system to make decisions based on historical data

Artificial Intelligence (AI)

Data engineers:

Build the IT infrastructure and architecture and architecture for the growth of data. Support data analysts and data scientists by providing them with tools and access to data

____ are descriptive components of a data set

Categorical values

____ are the descriptive components within a data set, such as the gender identity of an employee is a categorical

Categorical values

____ refers to the various characteristics of a data set, including:

Categorical values: descriptive components Quantitative values: numeric data points that can be summed, counted, or otherwise analyzed using mathematical operations

____ is the categorization of data into groups based on similarities found in a data label that was previously identified; this differs from clustering in the type of machine learning (ML) they use: - they use unsupervised ML to analyze unlabeled data inputs - they use supervised ML to analyze labeled data inputs

Classification analysis

____ is an analytics technique that categorizes data points into groups based on their similarities

Clustering or cluster analysis

Data analysts:

Collects, manipulates, and analyzes data from across a business; understand business operations, including business processes, as well as technical aspects of a business

Design concepts include:

Color White space Typography Iconography

___ refers to the first two key factors of exploratory data analytics: categorical values and qualitative values

Data composition

______ refers to the various characteristics of a data set that include categorical values and quantitative values

Data composition

____ is exploring a new data set and understanding its composition commonly involve this data analytics technique

Data summarization

Exploratory analytics techniques include:

Data summarization Clustering Classification analysis

______ is the presentation of data in a graphical format, such as charts and graphs, that is used for analysis and communication; it turns complete, accurate, and reliable data into a story

Data visualization

____ is the value to be understood, often called the outcome

Dependent variable

Categories of data analytics:

Descriptive Diagnostic Predictive Prescriptive

____ is a central idea or theme of a visualization that drives the design's meaning and tone

Design concept

Data visualizations are based on ____

Design concepts

____ is data about activities in a system and includes the time stamp of when the activities occur

Event log data

____ data analytics reveals key characteristics of a data set

Explanatory

____ reveal the key characteristics of a data set; this helps us identify three key factors of a data set: categorical values, quantitative values, and value patterns or trends

Exploratory data analytics

______ reveals key characteristics of a data set

Exploratory data analytics

Reinforcement learning:

Focuses on decision making by rewarding desired behaviors and or punishing undesired behaviors

___ is the process of estimating future events based on the combination of past and present time series data; a predictive analytic method

Forecasting

Geospatial analysis:

Gathers, transforms, and visualizes geographic data and imagery, including satellite photographs and GPS coordinates; banks use geospatial data to track credit card transactions and flag suspicious transactions

_____ is the factor that may be influencing the dependent variable; there can be one or more independent variables, depending on the type of regression performed

Independent variable

Examples of process mining:

Investment firms: process mining can be used to identify unusual activities of traders and portfolio managers, such as a trade being executed unexpectedly before it has been approved Journal entries: process mining can be used to show the chain of command of journal entries, including originator, poster, and approver, in order to satisfy segregation of duties Information Technology (IT): events for IT requests can be analyzed for processes like change management, addition of new users to a system, and system error reports

Explanatory data analytics techniques includes:

Linear regression Forecasting Monte Carlo simulation

____ helps systems learns from data to create rules and categories that will enable it to make predictions about future data; mimics human cognitive functions to learn and solve problems without human involvement

Machine Learning (ML)

____ uses algorithms and statistical models to train an AI system through patterns and trends in data sets.

Machine Learning (ML)

Monte Carlo Simulation:

Measures the sensitivity of changes in a simulation based on the existence of random variables; it can help predict cash flow

___ has one dependent variable and multiple independent variables

Multiple regression

The third key factor that data exploration reveals is:

Patterns of recurring or similar values, either categorical or quantitative; patterns that occur over time are called trends

___ is a popular method of summarizing smaller data sets

Pivot tables

Any part of the business that captures event data is a candidate for:

Process mining

____ uses event log data to show what individuals, systems, and machines are doing in a visual format

Process mining

Advanced data analytics techniques:

Process mining Network Analysis Geospatial Analysis Natural Language Processing (NLP)

Statisticians:

Provide a methodology for drawing conclusions from data; math oriented, with a focus on collection and interpretation of quantitative data using defined scientific methods

____ are the numerical data points that can be summed, counted, or otherwise analyzed using mathematical operations. Dates and dollar amounts are two examples of this.

Quantitative values

_____ are numeric data points that can be summed, counted or otherwise analyzed using mathematical operations in a data set

Quantitative values

___ has one dependent variable and only one independent variable

Simple regression

Data summarization:

Simplifies data to quickly identify trends by compressing the data into smaller, easier-to-understand outputs such as charts or tables; for example in excel a pivot table can be used to summarize sales order data based on location, type of product, and data of sale

Linear regression:

Statistical techniques that predict the relationships between on dependent variable and one or more independent variables; it does not establish cause and effect relationships between the variables; it only estimates the existence of a relationship between them

ML Approaches:

Supervised learning Unsupervised learning Reinforced learning

The three types of machine learning are:

Supervised, unsupervised, and reinforcement learning

The three types of ML:

Supervised: uses labeled data sweats to train the algorithm Unsupervised: uses algorithms to analyze unlabeled data sets for hidden patterns Reinforcement: focuses on decision making by rewarding desired behaviors and or punishing undesired behaviors

Descriptive analytics

Tells us what happened; looks at historical data and condenses it into smaller, more meaningful bits of information; the most common form of data analytics

Descriptive analytics:

Tells use what happened; looks at historical data and condenses it into smaller, more meaningful bits of information; the most common form of analytics

Forecasting:

The process of estimating future events based on a combination of past and present data; this predictive analytic method uses statistics and is a staple in accounting data analytics; it's common in sales forecasting, where historical sales data is used to create a forecast of future sales

____ occurs in chronological order across a period of time

Time series

Diagnostic analytics:

Use data mining to provide insights into why something happened; drills down to a granular; looks at data in a variety of ways to identify trends or investigate causes

Unsupervised learning:

Uses algorithms to analyze unlabeled data sets for hidden patterns

Diagnostic analytics

Uses data mining to provide insights into why something happened; drills down to the granular level; looks into the data in a variety of ways to identify trends or investigate causes

Process mining:

Uses event log data to show what individuals, systems, and machines are doing in a visual format; example a purchasing agent creates purchase orders and also approves them putting the business at high risk for purchasing fraud

Predictive analytics:

Uses statistical modeling and algorithms to predict what is likely to happen; provides powerful tools to assist decision making and inform future actions; requires assumptions and identifies possible outcomes based on these assumptions

Predictive analytics

Uses statistical modeling and algorithms to predict what is likely to happen; provides powerful tools to assist in decision making and inform future actions; requires assumptions and identifies possible outcomes based on these assumptions

Classification analysis:

Uses supervised machine learning to categorize labeled data into groups based on predefined labels

Prescriptive analytics

Uses the three other analytics types to gain insights; identifies what we should do; consists of advanced, cutting-edge technology and design; uses a decision making protocol and historical data to train a program on what to do in a real-time situation; requires advanced programming skills

Prescriptive analytics:

Uses the three previous analytics types to gain insights. Identifies what we should do; consists of advanced, cutting-edge technology and design; uses a decision-making protocol and historical data to train a program on what to do in a real-time situation; requires advanced programming skills

Clustering:

Uses unsupervised machine learning to categorize unlabeled data into groups based on similarities

Network analysis:

Visualizes relationships among participants in a data set to learn about the social structure based on those relationships; mostly used to study relationships among people who are displayed as nodes in a network analysis; the links between the nodes are the relationships, or interactions that connect the participants; used by banks to identify fraud risks by investigating transactions related to reported fraudulent transactions

To create an effective visualization, the designer should be aware of:

What an effective business presentation should look like How a business stakeholder may interpret a visualization The appropriate level of detail for the business need The value of white space

____ is negative space that creates a visual pause in a visualization; it provides benefits such as improving comprehension by avoiding distracting visual elements, reduces cognitive load by avoiding clutter and distractions, focuses attention on the message by isolating it, draws attention to interactive opportunities like filters and drill downs, balances important visual elements in an organized manner, and communicates more clearly by separating unrelated elements into sections

White space

Business or business intelligence analysts:

Work with data to find trends and leverage that information to improve operations

Data scientists:

Work with large volumes of data; design algorithms to collect and analyze data and conduct predictive analytics; require expert technical skills in statistics and coding

The simple linear regression equation:

Y = A + Bx Y = dependent variable X = independent variable A = Y-intercept (value of Y when x = 0) B = slope of line

categories of data analysis:

descriptive diagnostic prescriptive predictive

Categories of data analytics

descriptive, diagnostic, predictive, prescriptive

_____ is a technique that gathers, transforms, and visualizes geographic data and imagery, including satellite photographs, Global Positioning System (GPS) coordinates, etc.

geospatial analytics

Processes mining visualizes event data and makes it easier to:

identify deviations in an expected process path

___ contains predefined tags or descriptors that an ML algorithm uses to understand the data set or learn from it; it is an act of human intervention, which occurs only when using supervised machine learning analytics

labeled data

____ is a statistical technique we use to estimate the relationships between a dependent variable and one or more independent variables.

linear regression

____ is an estimate derived from observed data that shows a range of possible values

lower-confidence bound and upper-confidence bound; confidence interval

____ is a form of textual analysis that gathers, processes, and interprets meaning from human language

natural language processing (NLP)

____ is an analysis technique that visualizes relationships among participants in a data set to learn about the social structure of those relationships create; can be used to study relations among people, who are displayed as nodes in a network analysis

network analysis

The links between the ____ are the relationships, or interactions, that connect the participants

nodes

___ is additional movements in the time series data that cannot be explained as a trend or seasonality; a drastic spike in revenue at the end of February due to a large customer order for a one-time corporate event is an example

noise

____ is a consistent movement in the time series data that repeats on a regular basis; an example is an increase in revenue every November and December due to winter holiday sales

seasonality

_____ is a type of network analysis that investigates social structures on social media

social network analysis

_____ captures data that occurs in chronological order across a period of time

time series

____ is a consistent movement in the time series data that does not repeat

time trend

Supervised learning:

uses labeled data sets to train the algorithm


Ensembles d'études connexes

FoxyLearning Intro to Verbal Behavior Quiz

View Set

Romanticism in England: Themes in the Poetry of Keats

View Set

Chapter 2: Chemical Level of Organization

View Set

Exercise Testing and Prescription Quiz Chp. 1 & 2

View Set

projectiles, UCM, LUG exam - physics final

View Set

Microeconomics Exam 2 Study Guide

View Set