MIS chapter 6
variety
-Different forms of structured and unstructured data -data from spreadsheets and databases as well as from email, videos, photos and PDFS, all of which must be analyzed
velocity
-the analysis of streaming data as it travels around the Internet -analysis necessary of social media messages spreading globally
Volume
-the scale of data -includes enormous volumes of data generated daily -massive volume created by machines and networks -big data tools necessary to analyze zettabytes and brontobytes
Veracity
-the uncertainty of data, including biases, noise, and abnormalities -uncertainty or untrustworthiness of data -data must be meaningful to the problem being analyzed -must keep data clean and implement processes to keep dirty data from accumulating in systems
Advanced Data Analytics (techniques a data scientist will use to perform big data advanced analytics.
1) Behavioral Analysis 2) Correlation Analysis 3) Exploratory Data 4) Pattern recognition analysis 5) Social Media Analysis 6) Speech Analysis 7) Text Analysis 8) Web Analysis
Data Mining Process
1) Business Understanding 2) Data Understanding 3) Data Preparation 4) Data Modeling 5) Evaluation 6) Deployment
Data Mining techniques
1) Estimation Analysis 2) Affinity Grouping Analysis 3) Cluster Analysis 4) Classification Analysis
The four primary reasons for low-quality information
1. Online customers intentionally enter inaccurate infor to protect their privacy 2. Different systems have different info entry standards and formats 3. Data entry personnel enter abbreviated information to save time or errroneous information by accifrny 4. Third party and external information contains inconsistencies, inaccuracies, and errors.
Four Common Characteristics of Big Data
1. Variety 2. Veracity 3. Volume 4. Velocity
A recommendation engine
A data mining algorithm that analyzes a customers's purchases and actions on a website and then uses the data to recommend complementary products.
Primary Key
A field (or group of fields) that uniquely identifies a given entity in a table
Foreign Key
A primary key of one table that appears as an attribute in another table and acts to provide a logical relationship between the two tables
Information cleansing or scrubbing
A process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information
Optimization model.
A stastical process that finds the way to make a design, system, or decision as effective as possible.
Prediction
A statement about what will or might happen in the future. 3 common data mining techniques:
Five Common Characteristics of High-Quality Information
ACCURATE: is there an incorrect value in the information COMPLETE: is a value missing from the information CONSISTENT: is aggregate or summary information in agreement with detalied infor TIMELY: is the info current with repect to business needs UNIQUE: is each transaction event represented only once in the inormation
Social Media Analysis
Analyzes text flowing across the Internet, including unstructured text from blogs and messages
Business Focus Areas of Big Data
Data Mining Data Analysis Data Visualization
Correlation Analysis
Determines a statistical relationship between variables, often for the purpose of identifying predictive factors among the variables
Estimation Analysis
Determines values for an unknown continuous variable behavior or estimated future value
Exploratory Data Analysis
Identifies patterns in data, including outliers, uncovering the underlying structure to understand relationships between the variables.
Reasons Business Analysis Is Difficult from Operational Databases
Inconsistent Data Definitions Lack of Data Standards Poor Data Quality Inadequate Data Usefulness Ineffective Direct Data Access
Classification Analysis
The process of organizing data into categories or groups for its most effective and efficient use.
data element (data field)
The smallest or basic unit of information. Can include a customers name, address, email, discount rate, preferred shipping method, product name, quantity orderd.
Data Artist
a business analytics specialist who used visual tools to help people understand complex data
data broker
a business that collects personal information about consumers and sells that information to other organizations
Repository
a central location in which data is stored and managed
Big Data
a collection of large, complex data sets, including structured and unstructured data, which cannot be analyzed using traditional database methods and tools
record
a collection of related data elements
Outlier
a data value that is numerically distant from most of the other data points in a set of data..
data warehouse
a logical collection of info-gathered from many different operational databases-that supports business analysis activities and desision making tasks. the main purpose is to combine infor throughout an organization into a single repository.
information integrity
a measure of the quality of information
extraction, transformation, and loading (ETL)
a process that extracts info from internal and external databases, transforms it using a common set of enterprise definitions, and loads it into a data warehouse. that data warehouse then sends portions (or subsets) of the info to data marts.
regression model
a stastical process for estimating the relationships among variables.
data lake
a storage repository that holds a vast amount of raw data in its original format until the business needs it
data map
a technique for establishing a match, or balance, between the source data and the target data warehouse. Identifies data shortfalls and recognizes data issues.
Cluster Analysis
a technique used to divide an information set into mutually exclusive groups such that the members of each group are as close together as possible to one another and the different groups are as far apart as possible. Identifies similaraties and differences among data sets.
Cluster Analyis
a technique used to divide an information set into mutually exlusive groups such that the members of each group are as close togther as possible to one another and the different groups are as far apart as possible
relational database management system
allows users to create, read, update, and delete data in a relational database.
data-driven decision management
an approach to business governance that values decisions that can be backed up with verifiable data. The success of the data driven approach is reliant upon the quality of the data gathered and the effectivenss of its analysis and interpretation.
dynamic catalog
an area of a website that stores information about products in a database. Stores dynamic website info
data point
an individual item on a graph or a chart
data-driven website
an interactive website kept constantly updated and relevant to the needs of its customers using a database
Web Analysis
analyzes unstructured data associated with websites to identify consumer behavior and website navigation
Text analysis
analyzes unstructured data to find trends and patterns in words and sentences
Algorithms
are mathematical formulas placed in software that an analysis on a data set
integrity constraints
are rules that help ensure the quality of information
Attributes (also called columns or fields)
are the data elements associated with an entity
Stuctured Query Language (SQL)
asks users to write lines of code to answer questions against a database. Managers typically interact with QBE tools, and MIS professionals have the skills required to code SQL.
Comparative Analysis
can compare two or more data sets to identify patterns and trends.
information cube
common term for the representation of multidimensional information
data mart
contains a subset of data warehouse information. Think of data warehouses as having a more organizational focus and data marts as having a functional focus.
Database Management System (DBMS)
creates, reads, updates, and deletes data in a database while controlling access and security. Managers send in requests and the DBMS performs the actual manipulation of the data in the database.
Data warehousing components
data mart information cleansing business intelligence
physical view of information
deals with the physical storage of information on a storage device
business rule
defines how a company performs certain aspects of its business and typically results in either a yes/no or true/false answer
Data visualization
describes technologies that allow users to see or visualize data to transform information into a business perspective
data quality audit
determines the accuracy and completeness of its data
Business-critical integrity constraints
enforce business rules vital to an organization's success and often require more insight and knowledge than relational integrity constraints/ tend to mirror the very rules by which an organization achieves success.
dirty data
erroneous or flawed data. complete removal of dirty data from a source is impractical or virtually impossible.
market basket analysis
evaluates such items as websites and checkout scanner information to detect customers buying behavior and predict future behavior by identifying affinities among customers choices of products and services
Data Scientist
extracts knowledge from data by performing statistical analysis, data mining, and advanced analytics on big data to identify trends, market changes, and other relevant information
query by example (QBE)
helps users graphically design the answer to a question against a database
source data
identifies the primary location where data is collected.
dynamic information
includes data that change based on user actions. For example, static websites supply only info that will not change until the content editor changes the info. Dynamic info changes when a user requests info.
static information
includes fixed data incapable of change in the event of a user action
Data Validation
includes the tests and evaluations used to determine compliance with data governance policies to ensure correctness of data. Helps to ensure that every data value is correct and accurate.
Business Advantages of a Relational Database
increased flexibility increased scalability and performance reduced information redundancy increased information integrity increased information security
Database
maintains information about various types of objects (inventory), events (transactions), people (employees), and places (warehouses)
Data visualization tools
move beyond Excel graphs and charts into sophisticated analysis techniques such as controls, instruments, maps, time-series graphs, and more
information integrity issues
occur when a system produces incorrect, inconsistent, or duplicate data. data integrity issues can cause managers to consider the system reports invalid and make decisions based on other sources.
information inconsistency
occurs when the same data element has different values
Analysis paralysis
occurs when the user goes into an emotional state of over-analysis (or over-thinking) a situation so that a decision or action is never taken, in effect paralyzing the outcome
data set
organized collection of data
Forecasting model
predictions based on time series information, allowing users to manipulate the time series for forecasting activities
infographics (information graphics)
present the results of data analysis, displaying the patterns, relationships, and trends in a graphical format.
Distributed Computing
processes and manages algorithms across many machines in a computing environment. -individual computers are networked togehter across geographical areas and work together ti execute a workload or computing processes as if they were one single computing environment
Real-time systems
provide real-time information in response to requests. Many organizations use real time systems to uncover key corporate transcational information. The growing demand for real time info stems from organizations needs to make faster and more effective decisions, keep smaller inventories, operate more efficiently, and track performance more carefully.
Metadata
provides details about data. For example, metadata for an image could include its size, resolution, and date created
Information Granularity
refers to the extent of detail within the information (fine and detailed or coarse and abstract)
data steward
responsible for ensuring the policies and procedures are implemented across the organization and acts as a liaison between the MIS department and the business
Affinity Grouping Analysis
reveals the relationship between variables along with the nature and frequency of the relationships
relational integrity constraints
rules that enforce basic and fundamental information-based constraints
Fast Data
the application of big data analytics to smaller data sets in near-real or real-time in order to solve a problem or create business value.
Pattern Recognition Analysis
the classification or labeling of an identified pattern in the machine learning process
Data aggregation
the collection of data from various sources for the purpose of data processing
Virtualization
the creation of a virtual (rather than actual) version of computing resources, such as an operating system, a server, a storage device, or network resourses.
information redundancy
the duplication of data, or the storage of the same data in multiple places. One primary goal of a database is to eliminate info redundancy by recording each piece of info in only one place in the database. This saves disk space, makes performing info updates easier, and improves info quality.
content creator
the person responsible for creating the original website content
content editor
the person responsible for updating and maintaining website content
Data Mining
the process of analyzing data to extract info not offered by the raw data alone. Data mining allows companies to compile a complete picture of their operstions, all within a single view, allowing them to identify trends and improve forecasts.
Speech analysis
the process of analyzing recorded calls to gather information
Data profiling
the process of collecting statistics and information about data in an existing source. Insights can determine how easy or difficult it will be to use existing data for other purposes along with providing metrics on data quality.
Anomaly detection
the process of identifying rare or unexpected items or events in a data set that do not conform to other items in the data set
Data replication
the process of sharing information to ensure consistency between multiple data sources.
Analytics
the science of fact baased desion making. Advanced analytics uses data patterns to make forward looking predictions to explain to the organization where it is headed
Business Intelligence Dashboards
track corporate metrics such as critical success factors and key performance indicators and include advanced capabilities such as interactive controls, allowing users to manipulate data for analysis
Data mining tools
use a variety of techniques to find patterns and relationships in large volumes of information that preduct future behavior and guide decision making.
Behavioral Analysis
using data about people's behaviors to understand intent and predict future actions
competitive monitoring
when a company keeps tabs of its competitor's activities on the web using software that automatically tracks all competitor website activities such as discounts and new products