CIS Chapter #6
five common characteristics of high quality information
1. accuracy 2. completeness 3. consistency 4. timeliness 5. uniqueness
collect information from multiple systems in a common location that uses a universal querying tool
A key idea within data warehousing is to:
optimization model
A statistical process that finds the way to make a design, system, or decision as effective as possible; for example, finding the values of controllable variables that determine maximal productivity or minimal waste.
social media analysis
Analyzes text flowing across the Internet, including unstructured text from blogs and messages
web analysis
Analyzes unstructured data associated with websites to identify consumer behavior and website navigation.
text analysis
Analyzes unstructured data to find trends and patterns in words and sentences
outliers
Anomaly detection helps to identify ___________ in the data that can cause problems with mathematical modeling
1. increased flexibility 2. increased information integrity 3. increased scalability and performance 4. increased information security 5. reduced information redundance
Business Advantages of a Relational Database
heirarchal, network, relational (most impt)
DBMS use three primary data models for organizing information:
What is the difference between data governance and data stewardship?
Data governance focuses on enterprisewide policies and procedures, while data stewardship focuses on the strategic implementation of the policies and procedures
coarse granularity; drilling down; drilling up
Data mining can also begin at a summary information level (_____________________) and progress through increasing levels of detail (_____________________) or the reverse (______________________)
large
Data-driven capabilities are especially useful when a firm needs to offer ________ amounts of information, products, or services
https://html1-cluster-e.mheducation.com/smartbook2/data/156737/highlighted_epubmhe/OPS/img/chapter06/bal04716_0607.png
IMPT CHART
age, profession, or income (can include totals, counts, averages, and the like)
One example of a data aggregation is to gather information about particular groups based on specific variables such as:
only one
One primary goal of a database is to eliminate information redundancy by recording each piece of information in __________________ place in the database.
unstructured
Organizational data includes far more than simple structured data elements in a database; the set of data also includes _______________________ data such as voice mail, customer phone calls, text messages, and video clips, along with numerous new forms of data, such as tweets from Twitter
impossible
The complete removal of dirty data from a source is impractical or virtually ______________
costs
The more complete and accurate a company wants its information to be, the more it ________.
speech analysis
The process of analyzing recorded calls to gather information; brings structure to customer interactions and exposes information buried in customer contact center interactions with an enterprise
think of data warehouses as having a more organizational focus and data marts as having a functional focus
To distinguish between data warehouses and data marts:
flat architecture
While a traditional data warehouse stores data in files or folders, a data lake uses a _____________ to store data
managers MIS professionals
________________ typically interact with QBE tools, and _____________________ have the skills required to code SQL
data artist
a business analytics specialist who uses visual tools to help people understand complex data
data broker
a business that collects personal information about consumers and sells that information to other organizations
big data
a collection of large, complex data sets, including structured and unstructured data, which cannot be analyzed using traditional database methods and tools
recommendation engine
a data mining algorithm that analyzes a customer's purchases and actions on a website and then uses the data to recommend complementary products e.g. Netflix uses this to analyze each customer's film-viewing habits to provide recommendations for other customers with Cinematch, its movie recommendation system
dimension
a particular attribute of information
extraction, transformation, and loading (ETL)
a process that extracts information from internal and external databases, transforms it using a common set of enterprise definitions, and loads it into a data warehouse. The data warehouse then sends portions (or subsets) of the information to data marts
information cleansing/scrubbing
a process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information.
prediction
a statement about what will happen or might happen in the future; for example, predicting future sales or employee turnover
regression model
a statistical process for estimating the relationships among variables
data lake
a storage repository that holds a vast amount of raw data in its original format until the business needs it
data map
a technique for establishing a match, or balance, between the source data and the target data warehouse identifies data shortfalls and recognizes data issues; can also alert managers to inconsistencies or help determine the cause and effects of enterprise-wide business decisions
cluster analysis
a technique used to divide an information set into mutually exclusive groups such that the members of each group are as close together as possible to one another and the different groups are as far apart as possible groups similar attributes together to discover segments or clusters and then examines the attributes and values that define the clusters or segments
1. web browsers are much easier to use than directly accessing the database by using a custom-query tool 2. the web interface requires few or no changes to the database model 3. it costs less to add a web interface in front of a DBMS than to redesign and rebuild the system to support changes 4. easy to manage content b/c website owners can make changes without relying on MIS professionals and users can update a data-driven website with little or no training 5. easy to store large amounts of data b/c data-driven websites can keep large volumes of information organized. Website owners can use templates to implement changes for layouts, navigation, or website structure 6. easy to eliminate human errors b/c data-driven websites trap data-entry errors, eliminating inconsistencies while ensuring that all information is entered correctly
advantages to using data-driven websites
data-driven decision management
an approach to business governance that values decisions that can be backed up with verifiable data the success of this approach is reliant upon the quality of the data gathered and the effectiveness of its analysis and interpretation
data point
an individual item on a graph or a chart
data set
an organized collection of data
encompasses all organizational information, and its primary purpose is to support the performance of managerial analysis tasks
analytical information
the data elements associated with an entity
attributes (columns, fields)
defines how a company performs certain aspects of its business and typically results in either a yes/no or true/false answer
business rule
enforce business rules vital to an organization's success and often require more insight and knowledge than relational integrity constraints
business-critical integrity constraints
comparative analysis
can compare two or more data sets to identify patterns and trends employees can base their decisions on data sets, experience, or knowledge and, preferably a combination of all three
information cube
common term for the representation of multi-dimensional information
data mart
contains a subset of data warehouse information
the person responsible for creating the original website content
content creator
the person responsible for updating and maintaining website content
content editor
compiles all of the metadata about the data elements in the data model
data dictionary
the smallest or basic unit of information e.g. customer's name, address, email, discount rate, preferred shipping method, product name, quantity ordered
data element (data field)
occurs when a company examines its data to determine if it can meet business expectations, while identifying possible data gaps or where missing data might exist
data gap analysis
refers to the overall management of the availability, usability, integrity, and security of company data
data governance
the time it takes for data to be stored or retrieved
data latency
logical data structures that detail the relationships among data elements using graphics or pictures
data models
responsible for ensuring the policies and procedures are implemented across the organization and acts as a liaison between the MIS department and the business
data steward
the management and oversight of an organization's data assets to help provide business users with high-quality data that is easily accessible in a consistent manner
data stewardship
includes the tests and evaluations used to determine compliance with data governance polices to ensure correctness of data
data validation
outlier
data value that is numerically distant from most of the other data points in a set of data
a logical collection of information - gathered from many different operational databases - that supports business analysis activities and decision-making tasks
data warehouse
an interactive website kept constantly updated and relevant to the needs of its customers using a database
data-driven website
maintains information about various types of objects (inventory), events (transactions), people (employees), and places (warehouses)
database
creates, reads, updates, and deletes data in a database while controlling access and security
database management system (DBMS)
data visualization
describes technologies that allow users to see or visualize data to transform information into a business perspective
correlation analysis
determines a statistical relationship between variables, often for the purpose of identifying predictive factors among the variables
variety
different forms of structured and unstructured data
an area of a website that stores information about products in a database
dynamic catalog
includes data that change based on user actions
dynamic information
stores information about a person, place, thing, transaction, or event
entity (table)
dirty data
erroneous or flawed data
market basket analysis
evaluates such items as websites and checkout scanner information to detect customers' buying behavior and predict future behavior by identifying affinities among customers' choices of products and services
data scientist
extracts knowledge from data by performing statistical analysis, data mining, and advanced analytics on big data to identify trends, market changes, and other relevant information
a primary key of one table that appears as an attribute in another table and acts to provide a logical relationship between the two tables
foreign key
1. variety 2. veracity 3. volume 4. velocity
four common characteristics of big data
1. information type 2. information timeliness 3. information quality 4. information governance
four primary traits that determine the value of information
classification analysis
groups similar attributes together to discover segments or clusters and then examines the attributes and values that define the clusters or segments
Where has the business been? Historical perspective offers important variables for determining trends and patterns. Where is the business now? Looking at the current business situation allows managers to take effective action to solve issues before they grow out of control. Where is the business going? Setting strategic direction is critical for planning and creating solid business strategies.
how managers can use BI to answer tough business questions:
Exploratory Data Analysis
identified patterns in data, including outliers, uncovering the underlying structure to understand relationships between the variables.
source data
identifies the primary location where data is collected e.g. invoices, spreadsheets, time sheets, transactions, and electronic sources such as other databases
a broad administrative area that deals with identifying individuals in a system (such as a country, a network, or an enterprise) and controlling their access to resources within that system by associating user rights and restrictions with the established identity
identity management
refers to the extent of detail within the information (fine and detailed or coarse and abstract)
information granularity
occurs when the same data element has different values
information inconsistency
a measure of the quality of information
information integrity
occur when a system produces incorrect, inconsistent, or duplicate data
information integrity issues
the duplication of data, or the storage of the same data in multiple places
information redundancy
rules that help ensure the quality of information; the database design needs to consider these
integrity constraints
focuses on how individual users logically access information to meet their own particular business needs
logical view of information
the practice of gathering data and ensuring that it is uniform, accurate, consistent, and complete, including such entities as customers, suppliers, products, sales, employees, and other critical entities that are commonly integrated across organizational systems
master data management (MDM)
algorithms
mathematical formulas placed in software that performs an analysis on a data set
provides details about data e.g. matadata for an image could include size, resolution, and date created
metadata
data visualization tools
move beyond Excel graphs and charts into sophisticated analysis techniques such as controls, instruments, maps, time-series graphs, and more
analysis paralysis
occurs when the user goes into an emotional state of over-analysis (or over-thinking) a situation so that a decision or action is never taken, in effect paralyzing the outcome
deals with the physical storage of information on a storage device
physical view of information
forecasting model
predictions based on time-series information allowing users to manipulate the time series for forecasting activities
infographics (information graphics)
presents the results of data analysis, displaying the patterns, relationships, and trends in a graphical format
a field (or group of fields) that uniquely identifies a given entity in a table
primary key
to combine information, more specifically, strategic information, throughout an organization into a single repository in such a way that the people who need that information can make decisions and undertake business analysis
primary purpose of a data repository:
distributed computing
processes and manages algorithms across many machines in a computing environment
helps users graphically design the answer to a question against a database
query-by-example (QBE) tool
immediate, up-to-date information
real-time information
provide real-time information in response to requests
real-time systems
a collection of related data elements
record
allows users to create, read, update, and delete data in a relational database
relational database management system
stores information in the form of logically related two-dimensional tables
relational database model
rules that enforce basic and fundamental information-based constraints
relational integrity constraints
a central location in which data is stored and managed
repository
1. business understanding 2. data understanding 3. data preparation 4. data modeling 5. evaluation 6. deployment
six primary phases in the data mining process
Data warehouses go even a step further by __________________ information EX?
standardizing Gender, for instance can be referred to in many ways (Male, Female, M/F, 1/0), but it should be standardized on a data warehouse with one common way of referring to each data element that stores gender (M/F).
includes fixed data incapable of change in the event of a user action
static information
business rule
stating that merchandise returns are allowed within 10 days of purchase is an example of a ________________________.
asks users to write lines of code to answer questions against a database
structured query language (SQL)
velocity
the analysis of streaming data as it travels around the internet
fast data
the application of big data analytics to smaller data sets in near-real or real-time in order to solve a problem or create business value
pattern recognition analysis
the classification or labeling of an identified pattern in the machine learning process
data aggregation
the collection of data from various sources for the purpose of data processing
virtualization
the creation of a virtual (rather than actual) version of computing resources, such as an operating system, a server, a storage device, or network resources
data mining
the process of analyzing data to extract information not offered by the raw data alone
data profiling
the process of collecting statistics and information about data in an existing source
anomoly detection
the process of identifying rare or unexpected items or events in a data set that do not conform to other items in the data set
data replication
the process of sharing information to ensure consistency between multiple data sources
volume
the scale of data
analytics
the science of fact-based decision making; uses software-based algorithms and statistics to derive meaning from data
veracity
the uncertainty of data, including biases, noise, and abnormalaties
1. optimization model 2. forecasting model 3. regression model
three data mining modeling techniques for predictions:
data, discovery, deployment
three elements of data mining:
1. data mining 2. data analysis 3. data visualization
three focus areas business are using to dissect, analyze, and understand organizational data
time-series information
time-stamped information collected at a particular frequency
business intelligence dashboards
track corporate metrics such as critical success factors and key performance indicators and include advanced capabilities such as interactive controls, allowing users to manipulate data for analysis
encompasses all of the information contained within a single business process or unit of work, and its primary purpose is to support daily operational tasks
transactional information
transactional and analytical
two primary types of information
1. relational 2. business critical
two types of integrity constraints
data mining tools
use a variety of techniques to find patterns and relationships in large volumes of information that predict future behavior and guide decision making
data quality audits
used to determine the accuracy and completeness of a firm's data
behavioral analysis
using data about people's behaviors to understand intent and predict future actions.
competitive monitoring
where a company keeps tabs of its competitor's activities on the web using software that automatically tracks all competitor website activities such as discounts and new products