MIS chapter 6

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

variety

-Different forms of structured and unstructured data -data from spreadsheets and databases as well as from email, videos, photos and PDFS, all of which must be analyzed

velocity

-the analysis of streaming data as it travels around the Internet -analysis necessary of social media messages spreading globally

Volume

-the scale of data -includes enormous volumes of data generated daily -massive volume created by machines and networks -big data tools necessary to analyze zettabytes and brontobytes

Veracity

-the uncertainty of data, including biases, noise, and abnormalities -uncertainty or untrustworthiness of data -data must be meaningful to the problem being analyzed -must keep data clean and implement processes to keep dirty data from accumulating in systems

Advanced Data Analytics (techniques a data scientist will use to perform big data advanced analytics.

1) Behavioral Analysis 2) Correlation Analysis 3) Exploratory Data 4) Pattern recognition analysis 5) Social Media Analysis 6) Speech Analysis 7) Text Analysis 8) Web Analysis

Data Mining Process

1) Business Understanding 2) Data Understanding 3) Data Preparation 4) Data Modeling 5) Evaluation 6) Deployment

Data Mining techniques

1) Estimation Analysis 2) Affinity Grouping Analysis 3) Cluster Analysis 4) Classification Analysis

The four primary reasons for low-quality information

1. Online customers intentionally enter inaccurate infor to protect their privacy 2. Different systems have different info entry standards and formats 3. Data entry personnel enter abbreviated information to save time or errroneous information by accifrny 4. Third party and external information contains inconsistencies, inaccuracies, and errors.

Four Common Characteristics of Big Data

1. Variety 2. Veracity 3. Volume 4. Velocity

A recommendation engine

A data mining algorithm that analyzes a customers's purchases and actions on a website and then uses the data to recommend complementary products.

Primary Key

A field (or group of fields) that uniquely identifies a given entity in a table

Foreign Key

A primary key of one table that appears as an attribute in another table and acts to provide a logical relationship between the two tables

Information cleansing or scrubbing

A process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information

Optimization model.

A stastical process that finds the way to make a design, system, or decision as effective as possible.

Prediction

A statement about what will or might happen in the future. 3 common data mining techniques:

Five Common Characteristics of High-Quality Information

ACCURATE: is there an incorrect value in the information COMPLETE: is a value missing from the information CONSISTENT: is aggregate or summary information in agreement with detalied infor TIMELY: is the info current with repect to business needs UNIQUE: is each transaction event represented only once in the inormation

Social Media Analysis

Analyzes text flowing across the Internet, including unstructured text from blogs and messages

Business Focus Areas of Big Data

Data Mining Data Analysis Data Visualization

Correlation Analysis

Determines a statistical relationship between variables, often for the purpose of identifying predictive factors among the variables

Estimation Analysis

Determines values for an unknown continuous variable behavior or estimated future value

Exploratory Data Analysis

Identifies patterns in data, including outliers, uncovering the underlying structure to understand relationships between the variables.

Reasons Business Analysis Is Difficult from Operational Databases

Inconsistent Data Definitions Lack of Data Standards Poor Data Quality Inadequate Data Usefulness Ineffective Direct Data Access

Classification Analysis

The process of organizing data into categories or groups for its most effective and efficient use.

data element (data field)

The smallest or basic unit of information. Can include a customers name, address, email, discount rate, preferred shipping method, product name, quantity orderd.

Data Artist

a business analytics specialist who used visual tools to help people understand complex data

data broker

a business that collects personal information about consumers and sells that information to other organizations

Repository

a central location in which data is stored and managed

Big Data

a collection of large, complex data sets, including structured and unstructured data, which cannot be analyzed using traditional database methods and tools

record

a collection of related data elements

Outlier

a data value that is numerically distant from most of the other data points in a set of data..

data warehouse

a logical collection of info-gathered from many different operational databases-that supports business analysis activities and desision making tasks. the main purpose is to combine infor throughout an organization into a single repository.

information integrity

a measure of the quality of information

extraction, transformation, and loading (ETL)

a process that extracts info from internal and external databases, transforms it using a common set of enterprise definitions, and loads it into a data warehouse. that data warehouse then sends portions (or subsets) of the info to data marts.

regression model

a stastical process for estimating the relationships among variables.

data lake

a storage repository that holds a vast amount of raw data in its original format until the business needs it

data map

a technique for establishing a match, or balance, between the source data and the target data warehouse. Identifies data shortfalls and recognizes data issues.

Cluster Analysis

a technique used to divide an information set into mutually exclusive groups such that the members of each group are as close together as possible to one another and the different groups are as far apart as possible. Identifies similaraties and differences among data sets.

Cluster Analyis

a technique used to divide an information set into mutually exlusive groups such that the members of each group are as close togther as possible to one another and the different groups are as far apart as possible

relational database management system

allows users to create, read, update, and delete data in a relational database.

data-driven decision management

an approach to business governance that values decisions that can be backed up with verifiable data. The success of the data driven approach is reliant upon the quality of the data gathered and the effectivenss of its analysis and interpretation.

dynamic catalog

an area of a website that stores information about products in a database. Stores dynamic website info

data point

an individual item on a graph or a chart

data-driven website

an interactive website kept constantly updated and relevant to the needs of its customers using a database

Web Analysis

analyzes unstructured data associated with websites to identify consumer behavior and website navigation

Text analysis

analyzes unstructured data to find trends and patterns in words and sentences

Algorithms

are mathematical formulas placed in software that an analysis on a data set

integrity constraints

are rules that help ensure the quality of information

Attributes (also called columns or fields)

are the data elements associated with an entity

Stuctured Query Language (SQL)

asks users to write lines of code to answer questions against a database. Managers typically interact with QBE tools, and MIS professionals have the skills required to code SQL.

Comparative Analysis

can compare two or more data sets to identify patterns and trends.

information cube

common term for the representation of multidimensional information

data mart

contains a subset of data warehouse information. Think of data warehouses as having a more organizational focus and data marts as having a functional focus.

Database Management System (DBMS)

creates, reads, updates, and deletes data in a database while controlling access and security. Managers send in requests and the DBMS performs the actual manipulation of the data in the database.

Data warehousing components

data mart information cleansing business intelligence

physical view of information

deals with the physical storage of information on a storage device

business rule

defines how a company performs certain aspects of its business and typically results in either a yes/no or true/false answer

Data visualization

describes technologies that allow users to see or visualize data to transform information into a business perspective

data quality audit

determines the accuracy and completeness of its data

Business-critical integrity constraints

enforce business rules vital to an organization's success and often require more insight and knowledge than relational integrity constraints/ tend to mirror the very rules by which an organization achieves success.

dirty data

erroneous or flawed data. complete removal of dirty data from a source is impractical or virtually impossible.

market basket analysis

evaluates such items as websites and checkout scanner information to detect customers buying behavior and predict future behavior by identifying affinities among customers choices of products and services

Data Scientist

extracts knowledge from data by performing statistical analysis, data mining, and advanced analytics on big data to identify trends, market changes, and other relevant information

query by example (QBE)

helps users graphically design the answer to a question against a database

source data

identifies the primary location where data is collected.

dynamic information

includes data that change based on user actions. For example, static websites supply only info that will not change until the content editor changes the info. Dynamic info changes when a user requests info.

static information

includes fixed data incapable of change in the event of a user action

Data Validation

includes the tests and evaluations used to determine compliance with data governance policies to ensure correctness of data. Helps to ensure that every data value is correct and accurate.

Business Advantages of a Relational Database

increased flexibility increased scalability and performance reduced information redundancy increased information integrity increased information security

Database

maintains information about various types of objects (inventory), events (transactions), people (employees), and places (warehouses)

Data visualization tools

move beyond Excel graphs and charts into sophisticated analysis techniques such as controls, instruments, maps, time-series graphs, and more

information integrity issues

occur when a system produces incorrect, inconsistent, or duplicate data. data integrity issues can cause managers to consider the system reports invalid and make decisions based on other sources.

information inconsistency

occurs when the same data element has different values

Analysis paralysis

occurs when the user goes into an emotional state of over-analysis (or over-thinking) a situation so that a decision or action is never taken, in effect paralyzing the outcome

data set

organized collection of data

Forecasting model

predictions based on time series information, allowing users to manipulate the time series for forecasting activities

infographics (information graphics)

present the results of data analysis, displaying the patterns, relationships, and trends in a graphical format.

Distributed Computing

processes and manages algorithms across many machines in a computing environment. -individual computers are networked togehter across geographical areas and work together ti execute a workload or computing processes as if they were one single computing environment

Real-time systems

provide real-time information in response to requests. Many organizations use real time systems to uncover key corporate transcational information. The growing demand for real time info stems from organizations needs to make faster and more effective decisions, keep smaller inventories, operate more efficiently, and track performance more carefully.

Metadata

provides details about data. For example, metadata for an image could include its size, resolution, and date created

Information Granularity

refers to the extent of detail within the information (fine and detailed or coarse and abstract)

data steward

responsible for ensuring the policies and procedures are implemented across the organization and acts as a liaison between the MIS department and the business

Affinity Grouping Analysis

reveals the relationship between variables along with the nature and frequency of the relationships

relational integrity constraints

rules that enforce basic and fundamental information-based constraints

Fast Data

the application of big data analytics to smaller data sets in near-real or real-time in order to solve a problem or create business value.

Pattern Recognition Analysis

the classification or labeling of an identified pattern in the machine learning process

Data aggregation

the collection of data from various sources for the purpose of data processing

Virtualization

the creation of a virtual (rather than actual) version of computing resources, such as an operating system, a server, a storage device, or network resourses.

information redundancy

the duplication of data, or the storage of the same data in multiple places. One primary goal of a database is to eliminate info redundancy by recording each piece of info in only one place in the database. This saves disk space, makes performing info updates easier, and improves info quality.

content creator

the person responsible for creating the original website content

content editor

the person responsible for updating and maintaining website content

Data Mining

the process of analyzing data to extract info not offered by the raw data alone. Data mining allows companies to compile a complete picture of their operstions, all within a single view, allowing them to identify trends and improve forecasts.

Speech analysis

the process of analyzing recorded calls to gather information

Data profiling

the process of collecting statistics and information about data in an existing source. Insights can determine how easy or difficult it will be to use existing data for other purposes along with providing metrics on data quality.

Anomaly detection

the process of identifying rare or unexpected items or events in a data set that do not conform to other items in the data set

Data replication

the process of sharing information to ensure consistency between multiple data sources.

Analytics

the science of fact baased desion making. Advanced analytics uses data patterns to make forward looking predictions to explain to the organization where it is headed

Business Intelligence Dashboards

track corporate metrics such as critical success factors and key performance indicators and include advanced capabilities such as interactive controls, allowing users to manipulate data for analysis

Data mining tools

use a variety of techniques to find patterns and relationships in large volumes of information that preduct future behavior and guide decision making.

Behavioral Analysis

using data about people's behaviors to understand intent and predict future actions

competitive monitoring

when a company keeps tabs of its competitor's activities on the web using software that automatically tracks all competitor website activities such as discounts and new products


संबंधित स्टडी सेट्स

HW 8: chapter 18.1 to chapter 18.4

View Set

INFS Lecture 9: Business Intelligence and Data Mining

View Set

Chapter 15: The Health Care Team: Where the Practical/Vocational Nurse Fits In

View Set

SECTION 9 UNIT 5 RISK MANAGEMENT STRATEGIES

View Set

Psych of Adolescents- Chapter 10

View Set

Skin Homeostatic Imbalances, Midterm Review

View Set

Biology 103 - Module 9 Study Guide

View Set