Ch 6
attributes
(also called columns or fields) are the data elements associated with an entity ex. MUSICIAN ID, MUSICIAN NAME
entity
(also referred to as a table) stores information about a person, place, thing, transaction, or event. ex. MUSICIANS
Reasons Business Analysis Is Difficult from Operational Databases
- inconsistent data definitions - lack of data standards - poor data quality - inadequate data usefulness - ineffective direct data access
4 common characteristics of big data
-variety -veracity - volume - velocity
Five Common Characteristics of High-Quality Information
1) accurate 2) complete 3) consistent 4) timely 5) unique
Data Mining Process Model Activities (not highlighted)
1. business understanding- Gain a clear understanding of the business problem that must be solved and how it impacts the company. 2. data understanding - Analyze all current data along with identifying any data quality issues 3. data preparation - Gather and organize the data in the correct formats and structures for analysis. 4. data modeling - Gather and organize the data in the correct formats and structures for analysis. 5. evaluation - Apply mathematical techniques to identify trends and patterns in the data. 6. deployment - Analyze the trends and patterns to assess the potential for solving the business problem.
Advanced Data Analytics (not highlighted)
Behavioral analysis Using data about people's behaviors to understand intent and predict future actions. Correlation analysis Determines a statistical relationship between variables, often for the purpose of identifying predictive factors among the variables. Exploratory data analysis Identifies patterns in data, including outliers, uncovering the underlying structure to understand relationships between the variables. Pattern recognition analysis The classification or labeling of an identified pattern in the machine learning process. Social media analysis Analyzes text flowing across the Internet, including unstructured text from blogs and messages. Speech analysis The process of analyzing recorded calls to gather information; brings structure to customer interactions and exposes information buried in customer contact center interactions with an enterprise. Speech analysis is heavily used in the customer service department to help improve processes by identifying angry customers and routing them to the appropriate customer service representative. Text analysis Analyzes unstructured data to find trends and patterns in words and sentences. Text mining a firm's customer support email might identify which customer service representative is best able to handle the question, allowing the system to forward it to the right person. Web analysis
Data Mining Techniques
Estimation - determines values for an unknown continuous variable behavior or estimated future value Affinity grouping - reveals the relationship between variables along with the nature and frequency of the relationships **Cluster analysis - is a technique used to divide an information set into mutually exclusive groups such that the members of each group are as close together as possible to one another and the different groups are as far apart as possible. Cluster analysis identifies similarities and differences among data sets, allowing similar data sets to be clustered together. A customer database includes attributes such as name and address, demographic information such as gender and age, and financial attributes such as income and revenue spent. Classification - process of organizing data into categories or groups for its most effective and efficient use
four primary reasons for low-quality information are:
Online customers intentionally enter inaccurate information to protect their privacy. Different systems have different information entry standards and formats. Data-entry personnel enter abbreviated information to save time or erroneous information by accident. Third-party and external information contains inconsistencies, inaccuracies, and errors
Structured and Unstructured Data Examples
Structured -sensor data - weblog data - financial data - clickstream data Unstructured - satellite images - photographic data - video data - text messages
Here are a few examples of how managers can use BI to answer tough business questions:
Where has the business been? Historical perspective offers important variables for determining trends and patterns. Where is the business now? Looking at the current business situation allows managers to take effective action to solve issues before they grow out of control. Where is the business going? Setting strategic direction is critical for planning and creating solid business strategies.
data artist
a business analytics specialist who uses visual tools to help people understand complex data
repository
a central location in which data is stored and managed
recommendation engine
a data mining algorithm that analyzes a customer's purchases and actions on a website and then uses the data to recommend complementary products. Netflix uses a recommendation engine to analyze each customer's film-viewing habits to provide recommendations for other customers with Cinematch, its movie recommendation system
primary key
a field (or group of fields) that uniquely identifies a given record in a table. In the table RECORDINGS, the primary key is the field RecordingID that uniquely identifies each record in the table.
data warehouse
a logical collection of information—gathered from many different operational databases—that supports business analysis activities and decision-making tasks Data warehouses go even a step further by standardizing information. Gender, for instance can be referred to in many ways (Male, Female, M/F, 1/0), but it should be standardized on a data warehouse with one common way of referring to each data element that stores gender (M/F). Standardization of data elements allows for greater accuracy, completeness, and consistency and increases the quality of the information in making strategic business decisions
information cleansing or scrubbing
a process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information.
data lake
a storage repository that holds a vast amount of raw data in its original format until the business needs it. While a traditional data warehouse stores data in files or folders, a data lake uses a flat architecture to store data
data driven decision management
an approach to business governance that values decisions that can be backed up with verifiable data. The success of the data-driven approach is reliant upon the quality of the data gathered and the effectiveness of its analysis and interpretation.
Dynamic website information is stored in a dynamic catalog
an area of a website that stores information about products in a database.
relational integrity constraints
are rules that enforce basic and fundamental information-based constraints
integrity constraints
are rules that help ensure the quality of information. The database design needs to consider integrity constraints. types of integrity constraints: (1) relational and (2) business critical.
comparative analysis
can compare two or more data sets to identify patterns and trends. Employees can base their decisions on data sets, experience, or knowledge and, preferably a combination of all three
data dictionary
compiles all of the metadata about the data elements in the data model. Looking at a data model along with reviewing the data dictionary provides tremendous insight into the database's functions, purpose, and business rules.
data mart
contains a subset of data warehouse information
content creator and content editor
content creator - is the person responsible for creating the original website content. content editor - the person responsible for updating and maintaining website content.
database management system (DBMS)
creates, reads, updates, and deletes data in a database while controlling access and security. ' Managers send requests to the DBMS, and the DBMS performs the actual manipulation of the data in the database.
data warehousing components
data mart information cleansing business intelligence
business rule
defines how a company performs certain aspects of its business and typically results in either a yes/no or true/false answer. Stating that merchandise returns are allowed within 10 days of purchase is an example of a business rule.
data vizualization
describes technologies that allow users to see or visualize data to transform information into a business perspective.
business-critical integrity constraints
enforce business rules vital to an organization's success and often require more insight and knowledge than relational integrity constraints
DBMS use three primary data models for organizing information: hierarchical, network, and the relational database, the most prevalent
hierarchical, network and relational database (most prevalent)
source data
identifies the primary location where data is collected. Source data can include invoices, spreadsheets, time sheets, transactions, and electronic sources such as other databases. Managers send their information requests to the MIS department, where a dedicated person compiles the various reports. In some situations, responses can take days, by which time the information may be outdated and opportunities lost. Many organizations find themselves in the position of being data rich and information poor. Even in today's electronic world, managers struggle with the challenge of turning their business data into business intelligence.
data validation
includes the tests and evaluations used to determine compliance with data governance polices to ensure correctness of data. Data validation helps to ensure that every data value is correct and accurate
data broker
is a business that collects personal information about consumers and sells that information to other organizations.
record
is a collection of related data elements ex. (3, Lady Gaga, gag.tiff, Do not bring young kids to live shows")
foreign key
is a primary key of one table that appears as an attribute in another table and acts to provide a logical relationship between the two tables. For instance, Black Eyed Peas in Figure 6.7 is one of the musicians appearing in the MUSICIANS table. Its primary key, MusicianID, is "2." Notice that MusicianID also appears as an attribute in the RECORDINGS table. By matching these attributes, you create a relationship between the MUSICIANS and RECORDINGS tables that states the Black Eyed Peas (MusicianID 2) have several recordings, including The E.N.D., Monkey Business, and Elepunk. In essence, MusicianID in the RECORDINGS table creates a logical relationship (who was the musician that made the recording) to the MUSICIANS table. Creating the logical relationship between the tables allows managers to search the data and turn it into useful information.
data map
is a technique for establishing a match, or balance, between the source data and the target data warehouse. This technique identifies data shortfalls and recognizes data issues. Data maps can also alert managers to inconsistencies or help determine the cause and effects of enterprise-wide business decisions.
data point
is an individual item on a graph or a chart. Organizational data includes far more than simple structured data elements in a database; the set of data also includes unstructured data such as voice mail, customer phone calls, text messages, and video clips, along with numerous new forms of data, such as tweets from Twitter
data-driven website
is an interactive website kept constantly updated and relevant to the needs of its customers using a database
data set
is an organized collection of data.
dirty data
is erroneous or flawed data
data steward
is responsible for ensuring the policies and procedures are implemented across the organization and acts as a liaison between the MIS department and the business
information redundancy
is the duplication of data, or the storage of the same data in multiple places
data stewardship
is the management and oversight of an organization's data assets to help provide business users with high-quality data that is easily accessible in a consistent manner.
master data management (MDM)
is the practice of gathering data and ensuring that it is uniform, accurate, consistent, and complete, including such entities as customers, suppliers, products, sales, employees, and other critical entities that are commonly integrated across organizational systems
data latency
is the time it takes for data to be stored or retrieved.
data models
logical data structures that detail the relationships among data elements by using graphics or pictures.
database
maintains information about various types of objects (inventory), events (transactions), people (employees), and places (warehouses)
real-time information
means immediate, up-to-date information.
data vizualiatoin tools
move beyond Excel graphs and charts into sophisticated analysis techniques such as controls, instruments, maps, time-series graphs, and more
information integrity systems
occur when a system produces incorrect, inconsistent, or duplicate data. Data integrity issues can cause managers to consider the system reports invalid and make decisions based on other sources.
data gap analysis
occurs when a company examines its data to determine if it can meet business expectations, while identifying possible data gaps or where missing data might exist.
information inconsistency
occurs when the same data element has different values. ex. lady got married and change last name. Now 2 different last names
analysis paralysis
occurs when the user goes into an emotional state of overanalysis (or overthinking) a situation so that a decision or action is never taken, in effect paralyzing the outcome. In the time of big data, analysis paralysis is a growing problem.
Data Mining Modeling Techniques for Predictions
optimization model - A statistical process that finds the way to make a design, system, or decision as effective as possible forecasting model - (Time-series information: Time-stamped information collected at a particular frequency.) regression model - A statistical process for estimating the relationships among variables.
physical and logical view of information
physical view of information - deals with the physical storage of information on a storage device logical view of information - focuses on how individual users logically access information to meet their own particular business needs.
infographics
present the results of data analysis, displaying the patterns, relationships, and trends in a graphical format.
extraction, transformation, and loading (ETL)
process that extracts information from internal and external databases, transforms it using a common set of enterprise definitions, and loads it into a data warehouse
distributed computing
processes and manages algorithms across many machines in a computing environment
real time systems
provide real-time information in response to requests. Many organizations use real-time systems to uncover key corporate transactional information. The growing demand for real-time information stems from organizations' need to make faster and more effective decisions, keep smaller inventories, operate more efficiently, and track performance more carefully.
metadata
provides details about data. For example, metadata for an image could include its size, resolution, and date created. Metadata about a text document could contain document length, data created, author's name, and summary.
Two primary tools are available for retrieving information from a DBMS...
query-by-example (QBE) tool - helps users graphically design the answer to a question against a database. structured query language (SQL) - asks users to write lines of code to answer questions against a database. Managers typically interact with QBE tools, and MIS professionals have the skills required to code SQL
information granularity
refers to the extent of detail within the information (fine and detailed or coarse and abstract
data governance
refers to the overall management of the availability, usability, integrity, and security of company data
relational database model and relational database management system
relational database model - stores information in the form of logically related two-dimensional tables. relational database management system - allows users to create, read, update, and delete data in a relational database
static and dynamic information
static information - includes fixed data incapable of change in the event of a user action dynamic information - includes data that change based on user actions. For example, static websites supply only information that will not change until the content editor changes the information. Dynamic information changes when a user requests information
fast data
the application of big data analytics to smaller data sets in near-real or real-time in order to solve a problem or create business value. The term fast data is often associated with business intelligence, and the goal is to quickly gather and mine structured and unstructured data so that action can be taken.
data aggregation
the collection of data from various sources for the purpose of data processing
information cube
the common term for the representation of multidimensional information. displays a cube (Cube a) that represents store information (the layers), product information (the rows), and promotion information (the columns).
virtualization
the creation of a virtual (rather than actual) version of computing resources, such as an operating system, a server, a storage device, or network resources
data mining
the process of analyzing data to extract information not offered by the raw data alone. Data mining can also begin at a summary information level (coarse granularity) and progress through increasing levels of detail (drilling down) or the reverse (drilling up) 3 STEPS Data: Foundation for data-directed decision making. Discovery: Process of identifying new patterns, trends, and insights. Deployment: Process of implementing discoveries to drive success
data profiling
the process of collecting statistics and information about data in an existing source. Insights extracted from data profiling can determine how easy or difficult it will be to use existing data for other purposes along with providing metrics on data quality
data element or (data field)
the smallest or basic unit of information. Data elements can include a customer's name, address, email, discount rate, preferred shipping method, product name, quantity ordered, and so on.
data quality audits
to determine the accuracy and completeness of its data
business intelligence dashboards
track corporate metrics such as critical success factors and key performance indicators and include advanced capabilities such as interactive controls, allowing users to manipulate data for analysis.
data mining tools
use a variety of techniques to find patterns and relationships in large volumes of information that predict future behavior and guide decision making. Data mining uncovers trends and patterns, which analysts use to build models that, when exposed to new information sets, perform a variety of information analysis functions. Data mining tools for data warehouses help users uncover business intelligence in their data.