CIS 190 - Chapter 6

Ace your homework & exams now with Quizwiz!

Four Vs of big data

Volume, Velocity, Variety, Veracity

Geographic information system (GIS)

a computer-based tool that captures, stores, manipulates, analyzes, and visualizes geographic data on a map.

Affinity analysis

a data mining technique that discovers cooccurrence relationships among activities performed by specific individuals or groups.

Big data

a data set that is too large or complex to be analyzed using traditional data processing applications.

satisficing

a decision-making strategy that involves searching through available alternatives until an acceptable solution is found. It is a composite of the words "satisfy" and "suffice".

Resilient distributed dataset (RDD)

a fault-tolerant, immutable and distributed collection of objects that can be processed in parallel across a cluster.

Self-service analytics

a form of BI that enables and encourages managers and other users to perform queries and generate reports with nominal IT support.

Dashboard

a graphical user interface that provides at-a-glance views of relevant KPIs to an organization or department.

Time-series regression

a model that estimates the direction a variable is trending over time.

Data science

a multi-disciplinary field that uses domain expertise, scientific methods, programming skills, algorithms and statistics to extract knowledge and insights from structured, semi-structured and unstructured big data sets to predict future behavior and prescribe actions.

Predictive modeling

a process that uses data mining and probabilities to forecast outcomes to create a statistical model to predict outcomes.

Business intelligence (BI)

a set of best practices, software, infrastructure and tools to acquire and transform raw highly structured data into actionable insights to help managers at all levels of the organization make informed business decisions.

Digital dashboard

a static or interactive electronic interface used to acquire and consolidate data across an organization.

Linear regression

a statistical method that analyzes and finds relationships between a dependent variable and one or more independent (or explanatory) variables. Simple linear regression has one explanatory variable. Multiple linear regression has two or more explanatory variables.

Data product

a technical function that encapsulates an algorithm and is designed to integrate directly into core applications.

Trended time series

a time series in which the mean value of the time series can fluctuate by season.

Constant time series

a time series in which the mean value of the time series is constant over time.

Modern BI

allows users to product reports and analysis on-the-fly and share data with other users to make decisions and optimize business results.

Citizen data scientist

an employee in an organization who can use advanced data analytic methods and techniques and software to create data models but has not been formally trained as a data scientist.

Predictive model

based on several factors likely to influence future behavior and predicts at some confidence level the outcome of an event.

veracity

concerned with data that are incomplete, missing, or duplicated need to be repaired

Volume

concerned with handling all the sheer volume of "big data" and provide comprehensive analytics capabilities in the big data platform

Variety

concerned with the fact that the analytic environment has expanded from pulling most structured data from a single enterprise data warehouse to include a variety of semi-structured and unstructured sources such as social media posts, tweets, videos, images, sensor data, and customer service calls

Descriptive data analytics

create a summary of historical data to yield useful information and possibly prepare the data for future more sophisticated analysis.

Geospatial data

data that have an explicit geographic component, ranging from vector and raster data to tabular data with site locations.

Rules-based decision-making

decision-making that helps novices make decisions like an expert.

Prescriptive data analytics

is dedicated to finding the best course of action among various choices given the known parameters.

Machine learning

scientific algorithms that identify patterns in big data to learn from the data and create insights based on the data.

Drill down

searching for something on a computer moving from general information to more detailed information by focusing on something of interest, for example, quarterly sales—monthly sales—daily sales.

Advanced data analytics

the examination of data using sophisticated methods and techniques to discover deeper insights, make predictions and/or generate recommendations.

bounded rationality

the idea that rationality is limited by the tractability of the decision, cognitive limitations of the mind and time available to make the decision.

Embedded BI

the integration of self-service analytics tools and capabilities within commonly used business software apps.

Data mashup

the integration of two or more data sets from various business systems and external sources without relying on the middle step of ETL (extract, transform, and load) into a data warehouse or help from IT.

Data visualization

the presentation of data in a graphical format to make it easier for decision-makers to grasp difficult concepts or identify new patterns in the data.

Decision optimization

the process of calculating values of variables that lead to an optimal value of the event under investigation.

Text mining

the process of deriving high quality information from text aided by software that can identify concepts, patterns, topics, keywords and other attributes in the unstructured data.

Data analytics

the process of examining data sets to draw conclusions about the information they contain, usually with the help of computer software.

optimizing

the process of finding an alternative that is most cost effective or produces best achievable performance under given constraints by maximizing desired effects and minimizing undesired effect.

Geocoding

the process of reading input text such as an address and converting it to output in the form of a latitude/ longitude coordinate.

Data discovery

the process of using BI to collect data from various databases and consolidate it into a single source that can be easily and instantly evaluated.

Predictive data analytics

the process of using data analytics methods and techniques to model and make predictions about unknown events from data.

Data mining

the process of using software to analyze unstructured, semi-structured and structured data from various perspectives, categorize them, and derive correlations or patterns among fields in the data.

velocity

the speed with which data is stored, analyzed and reports generated

Cognitive computing

the technology that uses machine learning algorithms.

Augmented analytics

the use of machine learning and AI in BI tools to automate data preparation and help users discover and share insights.

Augmented reality (AR)

the use of more contemporary 3-D visualization methods and techniques to illustrate the relationships within data including smart mapping, smart routines, machines learning, and natural language processing.

Sentiment analysis

uses natural language processing, text analysis, computational linguistics and biometrics to systematically identify, extract and quantify affective stages and subjective information.

Data science lifecycle

1. capture 2. store 3. model 4. analyze 5. communicate 6. deploy 7. reiterate

Phases of decision-making

1. intelligence 2. design 3. choice 4. review


Related study sets

XCEL Chapter 2: Nature of Insurance, Risk, Perils, and Hazards

View Set

Chapter 9 Senses: Seeing and Hearing

View Set

Structured Programming Week 13: Recursion

View Set

LIS5487 Chapter 4 Quiz Questions

View Set

Pharm 5: Administering Injections

View Set

EAQ Psych - Mental Health Disorders

View Set