MISY CH 6: Data Business Intelligence
Regression Model
* A statistical process for estimating the relationships among variables. * Regression models include many techniques for modeling and analyzing several variables when the focus is on the relationship between a dependent variable and one or more independent variables: - Predict the winners of a marathon based on gender, height, weight, hours of training. - Explain how the quantity of weekly sales of a popular brand of beer depend on its price at a small chain of supermarkets.
Forecasting Model
* Time-series information is time-stamped information collected at a particular frequency. * Forecasts are predictions based on time-series information, allowing users to manipulate the time series for forecasting activities. - Web visits per hour - Sales per month - Customer service calls per day
Information Type
* Transactional (operational) * Analytical (managerial)
Why you would want to define access level security?
- Access levels will typically mimic the hierarchical structure of the organization and protect organizational information from being viewed and manipulated by individuals who should not have access to the sensitive or confidential information - Low level employees typically have the lowest levels of access - High level employees typically have access to all types of database information For example: You would not want analysts viewing all salary information for the entire company - in general: * Analysts can usually only view their own salary * Managers have higher access and can view the salaries of all their team members, but cannot view other managers' salaries * Directors can view all of their managers' and analysts' salaries, but not other directors' salaries * The CFO and CEO can view every employee's salary
Contact Information in Operational Systems
- Billing - Customer Service - Marketing - Sales
Data mining uncovers patterns and trends such as:
- Building budgets and other financial information - Detecting fraud by identifying deceptive spending patterns - Finding the best customers who spend the most money - Keeping customers from leaving or migrating to competitors - Promoting and hiring employees to ensure success
define the relationship between a data point, data broker, and data lake
- Data points are two pieces of raw data that have an intersection or correlation - Data points are in a data lake - A data broker collects data in a data lake
Supporting Decisions with Business Intelligence
- Data warehouses extend the transformation of data into information - In the 1990's executives became less concerned with the day-to-day business operations and more concerned with overall business functions - The data warehouse provided the ability to support decision making without disrupting the day-to-day operations
Reduced Information Redundancy
- Databases reduce information redundancy - Inconsistency is one of the primary problems with redundant information
Additional Data Driven Website Advantages
- Development: Allows the website owner to make changes any time—all without having to rely on a developer or knowing HTML programming. A well-structured, data-driven website enables updating with little or no training. Content management: A static website requires a programmer to make updates. This adds an unnecessary layer between the business and its Web content, which can lead to misunderstandings and slow turnarounds for desired changes. - Future expandability: Having a data-driven website enables the site to grow faster than would be possible with a static site. Changing the layout, displays, and functionality of the site (adding more features and sections) is easier with a data-driven solution. - Minimizing human error: Even the most competent programmer charged with the task of maintaining many pages will overlook things and make mistakes. This will lead to bugs and inconsistencies that can be time consuming and expensive to track down and fix. Unfortunately, users who come across these bugs will likely become irritated and may leave the site. A well-designed, data-driven website will have "error trapping" mechanisms to ensure that required information is filled out correctly and that content is entered and displayed in its correct format. Cutting production and update costs: A data-driven website can be updated and "published" by any competent data entry or administrative person. In addition to being convenient and more affordable, changes and updates will take a fraction of the time that they would with a static site. While training a competent programmer can take months or even years, training a data entry person can be done in 30 to 60 minutes. - More efficient: By their very nature, computers are excellent at keeping volumes of information intact. With a data-driven solution, the system keeps track of the templates, so users do not have to. Global changes to layout, navigation, or site structure would need to be programmed only once, in one place, and the site itself will take care of propagating those changes to the appropriate pages and areas. A data-driven infrastructure will improve the reliability and stability of a website, while greatly reducing the chance of "breaking" some part of the site when adding new areas. - Improved Stability: Any programmer who has to update a website from "static" templates must be very organized to keep track of all the source files. If a programmer leaves unexpectedly, it could involve re-creating existing work if those source files cannot be found. Plus, if there were any changes to the templates, the new programmer must be careful to use only the latest version. With a data-driven website, there is peace of mind, knowing the content is never lost, even if your programmer is.
The two primary computing models that have shaped the collection of big data include:
- Distributed computing - Processes and manages algorithms across many machines in a computing environment - Virtualization - The creation of a virtual (rather than actual) version of computing resources, such as an operating system, a server, a storage device, or network resources
The Solution; Business Intelligence
- Improving the quality of business decisions has a direct impact on costs and revenue - BI enables business users to receive data for analysis that is: Reliable Consistent Understandable Easily manipulated The result creates an agile intelligent enterprise.
Potential business effects resulting from low quality information include:
- Inability to accurately track customers - Difficulty identifying valuable customers - Inability to identify selling opportunities - Marketing to nonexistent customers - Difficulty tracking revenue Inability to build strong customer relationships
Reasons Business Analysis Is Difficult from Operational Databases
- Inconsistent Data Definitions - Lack of Data Standards - Poor Data Quality - Inadequate Data Usefulness - Ineffective Direct Data Access
Business Advantages of a Relational Database
- Increased flexibility - Increased information integrity - Increased scalability and performance - Increased information security - Reduced information redundance
You never want to find yourself using technology to help you make a bad decision faster
- Information inconsistency - Information integrity issues
Prediction modeling techniques include:
- Optimization modeling - Forecasting modeling - Regression modeling
Business Intelligence
- Organizational data is difficult to access - Organizational data contains structured data in database - Organizational data contains unstructured data such as voice mail, phone calls, text messages, and video clips
Can you define two business-critical integrity constraints for an ordering system?
- Product returns are not accepted for fresh product 15 days after purchase - A discount maximum of 20 percent
If they had to choose a percentage for acceptable information what would it be and why?
- Some companies are willing to go as low as 20% complete just to find business intelligence - Few organizations will go below 50% accurate, he information is useless if it is not accurate
What kinds of databases can be found around your college?
- Student registration - Course evaluation - Payroll - Parking services
Overview of a data-driven website
- The customer enters search criteria in the website - The database runs a query on the search criteria
Achieving perfect information is almost impossible
- The more complete and accurate an organization wants to get its information, the more it costs - The tradeoff between perfect information lies in accuracy verses completeness - Accurate information means it is correct, while complete information means there are no blanks - Most organizations determine a percentage high enough to make good decisions at a reasonable cost, such as 85% accurate and 65% complete
What is the primary difference between a database and data warehouse?
- The primary difference between a database and a data warehouse is that a database stores information for a single application, whereas a data warehouse stores information from multiple databases, or multiple applications, and external information such as industry information - This enables cross-functional analysis, industry analysis, market analysis, etc., all from a single repository - Data warehouses support only analytical processing (OLAP)
what would happen to a website that was not data-driven?
- The users would need to continually update the website data manually as the business data is updated. - This would be a redundant effort and most likely result in errors and the website could quickly become out of sync with the business data
Can you define two relational integrity constraints for an ordering system?
- Users cannot create an order for a nonexistent customer - An order cannot be shipped without an address
Dirty Data Problems
-Duplicate data -Misleading data -Incorrect data -Non-formatted data -Violates business rules data -Non-integrated data -Inaccurate data
Benefits of Good Information
-High quality information can significantly improve the chances of making a good decision -Good decisions can directly impact an organization's bottom line
A good way to explain databases is to compare them to spreadsheets What are the limitations when using a spreadsheet?
-Limited number of rows and columns (Excel: 65,536 rows by 256 columns) Once you use more than 65,536 rows you have outgrown your spreadsheet - Only one users can access the spreadsheet - Users can view all information in the spreadsheet - Users can change all information in the spreadsheet
Relationship of a database with a DBMS
-The user interacts directly with the DBMS -The DBMS obtains the information from the database
Characteristics of High-Quality Information
1. Accurate 2. Complete 3. Consistent 4. Unique 5. Timely
The Four Primary Sources of Low Quality Information (Costs of Using LQI)
1. Customers intentionally enter inaccurate information to protect their privacy 2. Different entry standards and formats 3. Operators enter abbreviated or erroneous information by accident or to save time 4. Third party and external information contains inconsistencies, inaccuracies, and errors (Additional: A customer service representative could accidentally transpose a number in an address or misspell a last name)
Data-driven website advantages
1. Easy to manage content 2. Easy to store large amounts of data 3. Easy to eliminate human errors
The Four Primary Traits of the Value of Information
1. Type 2. Timeliness 3. Quality 4. Governance
The Business benefits of High-Quality Information
1. information is everywhere in an organization 2. employees must be able to obtain and analyze the many different levels, formats, and granularities of organizational information to make decisions 3. successfully collecting, compiling, sorting and analyzing information can provide tremendous insight into how an organization is performing
Competitive Monitoring
A company keeps tabs of its competitor's activities on the web using software that automatically tracks all competitor website activities such as discounts and new products.
distinguish between a data warehouse and a data mart?
A data warehouse has an enterprisewide organizational focus, while a data mart focuses on a subset of information for a given business unit such as finance
Increased Scalability and Performance
A database must scale to meet increased demand, while maintaining acceptable performance levels
Primary Keys
A field (or group of fields) that uniquely identifies a given entity in a table
data warehouse
A logical collection of information - gathered from many different operational databases - that supports business analysis activities and decision-making tasks The primary purpose of a data warehouse is to aggregate information throughout an organization into a single repository for decision-making purposes
Entity
A person, place, thing, transaction, or event about which information is stored - The rows in a table contain entities
Foreign keys
A primary key of one table that appears as an attribute in another table and acts to provide a logical relationship among the two tables
Extraction, transformation, and loading (ETL)
A process that extracts information from internal and external databases, transforms the information using a common set of enterprise definitions, and loads the information into a data warehouse
information cleansing/scrubbing
A process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information
Attributes (fields, columns)
A quality or feature regarded as a characteristic or inherent part of someone or something. The data elements associated with an entity - The columns in each table contain the attributes
Prediction
A statement about what will happen or might happen in the future; for example, predicting future sales or employee turnover
Optimization Model
A statistical process that finds the way to make a design, system, or decision as effective as possible; for example: - Finding the values of controllable variables that determine maximal productivity or minimal waste - Determining which products to produce given a limited amount of ingredients - Choosing a combination of projects to maximize overall earnings.
Increased Flexibility
A well-designed database should: - Handle changes quickly and easily - Provide users with different views - Have only one physical view - Have multiple logical views
Information cleansing or scrubbing
An organization must maintain high-quality data in the data warehouse - dirty data
Social media analysis
Analyzes text flowing across the Internet, including unstructured text from blogs and messages
Text Analysis
Analyzes unstructured data to find trends and patterns in words and sentences. Text mining a firm's customer support email might identify which customer service representative is best able to handle the question, allowing the system to forward it to the right person.
Accuracy
Are all the values correct? For example, is the name spelled correctly? Is the dollar amount recorded properly?
Completeness
Are any of the values missing? For example, is the address complete including street, city, state, and zip code?
Classification
Assigns record to one of a predefined set of classes
Data Scientist perform big data analytics using
Behavioral analysis Correlation analysis Exploratory data analysis Pattern recognition analysis Social media analysis Speech analysis Text analysis Web analysis
Information Quality
Business decisions are only as good as the quality of the information used to make the decisions
What does the student view display when a student accesses the school's student database?
Courses enrolled Grades Tuition Credits for graduation
Have only one physical view (increased flexibility)
Deals with the physical storage of information on a storage device
Data Visualization
Describes technologies that allow users to "see" or visualize data to transform information into a business perspective
Information Granularities
Detail (Fine), Summary, Aggregate (Coarse) -reports for each salesperson, product, and part -reports for all sales personnel, all products, and all parts -reports across departments, organizations, and companies
Estimates Analysis
Determines values for an unknown continuous variable behavior or estimated future value
Transactional Information
Encompasses all of the information contained within a single business process or unit of work, and its primary purpose is to support the performing of daily operational tasks Ex: withdrawing cash from an ATM, making an airline reservation, purchasing stocks * Compile a list of additional transactional information examples These could include daily sales, hourly employee payroll, product orders, shipping an order
Analytical Information
Encompasses all organizational information, and its primary purpose is to support the performing of managerial analysis tasks (includes transactional information) * Also includes external organizational information such as market, industry, and economic conditions * is used to make ad-hoc decisions Ex: trends, sales, product statistics, and future growth projections * Compile a list of additional analytical information examples These could include: cost/benefit analysis, sales forecast, market trends, industry trends, and regulations
Business-critical integrity constraint
Enforces business rules vital to an organization's success and often requires more insight and knowledge than relational integrity constraints.
Storing Data Elements in
Entities and Attributes
What is the cost to the business of sending multiple identical marketing materials to the same customers?
Expense Risk of alienating customers
Have multiple logical views (increased flexibility)
Focuses on how individual users logically access information to meet their own particular business needs
Insurance:
Forecasting claim amounts and medical coverage costs; classifying the most important elements that affect medical coverage; predicting which customers will buy new insurance policies.
Banking:
Forecasting levels of bad loans and fraudulent credit card use, credit card spending by new customers, and which kinds of customers will best respond to (and qualify for) new loan offers_
What kinds of additional entities might be found in this database?
INVENTORY, MARKETING CAMPAIGN, SALES PRICE, CUSTOMER INFORMATION, PAYMENT INFORMATION
Exploratory Data Analysis (EDA)
Identifies patterns in data, including outliers, uncovering the underlying structure to understand relationships between the variables.
What occurs when you have the inability to build strong customer relationships?
Increase Buyer Power
Increased Information Security
Information is an organizational asset and must be protected
Consistency
Is aggregate or summary information in agreement with detailed information? For example, do all total fields equal the true total of the individual fields?
Uniqueness
Is each transaction, entity, and event represented only once in the information? For example, are there any duplicate customers?
Timeliness
Is the information current with respect to the business requirements? For example, is information updated weekly, daily, or hourly?
The Problem: Data Rich, Information Poor
Many organizations find themselves in the position of being data rich and information poor Even in today's electronic world, managers struggle with the challenge of turning their business data into business intelligence
Low Quality Information Examples
Missing Information: Without a first name it would be impossible to correlate this customer with customers in other databases (Sales, Marketing, Billing, Customer Service) to gain a compete customer view (CRM) Incomplete Information: Without a complete street address there is no possible way to communicate with this customer via mail or deliveries. An order might be sitting in a warehouse waiting for the complete address before shipping. The company has spent time and money processing an order that might never be completed Probable Duplicate Information: If this is the same customer, the company will waste money sending out two sets of promotions and advertisements to the same customers. It might also send two identical orders and have to incur the expense of one order being returned Potential Wrong Information: There are many times when a phone and a fax have the same number. Since the phone number is also in the e-mail address field, chances are that the number is inaccurate Inaccurate Information: The business would have no way of communicating with this customer via e-mail Incomplete Information: The company could determine the area code based on the customer's address. All incorrect information needs to be fixed, which costs time and money
Data Visualization tools
Moves beyond Excel graphs and charts into sophisticated analysis techniques such as pie charts, controls, instruments, maps, time-series graphs, etc. - can help uncover correlations and trends in data that would otherwise go unrecognized.
What kinds of additional attributes might be found in the MUSICIAN table?
MusicianBand MusicianEducation MusicianAge MusicianGender
Information is everywhere in a
Organiation
Databases offer several security features
Password: Provides authentication of the user Access level: Determines who has access to the different types of information Access control: Determines types of user access, such as read-only access
Operations management:
Predicting machinery failures; finding key factors that control optimization of manufacturing capacity.
Retail and sales:
Predicting sales; determining correct inventory levels and distribution schedules among outlets; and loss prevention.
Brokerage and securities trading:
Predicting when bond prices will change; forecasting the range of stock fluctuations for particular issues and the overall market; determining when to buy or sell stocks.
Creating Relationships Through Keys
Primary keys and foreign keys identify the various entities (tables) in the database
Real-time system
Provides real-time information in response to requests
Scalability
Refers to how well a system can adapt to increased demands
Affinity Grouping Analysis
Reveals the relationship between variable along with the nature and frequency of the relationships.
Structured Data Unstructured Data Examples
Structured Data - Sensor data - Weblog data - Financial data - Click-stream data - Point of sale data - Accounting data Unstructured Data - Satellite images - Photographic data - Video data - Social Media data - Text message - Voice mail data
What would happen if a new database called "RealData" hit the market and allowed only one logical view?
The "RealData" database simply would never sell. With only one logical view every person in an entire organization would have the same view
Why is the ability to look at information based on different dimensions critical to a businesses success?
The ability to look at information from different dimensions can add tremendous business insight - By slicing-and-dicing the information a business can uncover great unexpected insights
Information governance (IG)
The accountability framework and decision rights to achieve enterprise information management (EIM). IG is the responsibility of executive leadership for developing and driving the IG strategy throughout the organization. IG encompasses both data governance (DG) and information technology governance (ITG)
Velocity
The analysis of streaming data as it travels around the Internet - Analysis necessary of social media messages spreading globally,
BI in a Data-Driven Website
The customer enters search criteria in the website The database runs a query on the search criteria The company can gain BI by viewing how often items are searched, which item is searched the most, the least, etc. Companies can gain business intelligence by viewing the data accessed and analyzed from their website. The figure displays how running queries or using analytical tools, such as a Pivot Table, on the database that is attached to the website can offer insight into the business, such as items browsed, frequent requests, items bought together, etc.
Data Warehouse Model
The data warehouse modeled in the above figure compiles information from internal databases or transactional/operational databases and external databases through ETL It then send subsets of information to the data marts through the ETL process
poor data quality
The data, if available, were often incorrect or incomplete. Therefore, users could not rely on the data to make decisions
Information Formats
The different ways in which information can be presented using world wide web (www) technologies. Such as document, presentation, spreadsheet, database Examples are: Letters, memos, faxes, emails, reports, marketing materials, and training materials Product, strategy, process, financial, customer, and competitor Sales, marketing, industry, financial, competitor, customer, and order spreadsheets Customer, employee, sales, order, supplier, and manufacturer databases
information redundancy
The duplication of data or storing the same information in multiple places
How BI Can Answer Tough Customer Questions
The figure displays how organizations using BI can find the root causes to problems and provide solutions simply by asking "Why?" The process is initiated by analyzing a global report, say of sales per quarter. Every answer is followed by a new question, and users can drill deep down into a report to get to fundamental causes. Once they have a clear understanding of root causes, they can take highly effective action. Finding the answers to tough business questions by using data that is reliable, consistent, understandable, and easily manipulated allows a business to gain valuable insight into such things as: - Where the business has been. Historical perspective is always important in determining trends and patterns of behavior. - Where it is now. Current situations are critical to either modify if not acceptable or encourage if they are trending in the right direction. - And where it will be in the near future. Being able to predict with surety the direction of the company is critical to sound planning and to creating sound business strategies.
Data Mining
The process of analyzing data to extract information not offered by the raw data alone The three elements of data mining include: 1. Data - Foundation for data-directed decision making 2. Discovery - Process of identifying new patterns, trends, and insights 3. Deployment - Process of implementing discoveries to drive success
Speech analysis
The process of analyzing recorded calls to gather information; brings structure to customer interactions and exposes information buried in customer contact center interactions with an enterprise. Speech analysis is heavily used in the customer service department to help improve processes by identifying angry customers and routing them to the appropriate customer service representative
Data Analysis
The process of compiling, analyzing, and interpreting the results of primary and secondary data collection.
Volume
The scale of data - Includes enormous volumes of data generated daily - Massive volume created by machines and networks - Big data tools necessary to analyze zettabytes and brontobytes
Data Element
The smallest or basic unit of information
Veracity
The uncertainty of data, including biases, noise, and abnormalities - Uncertainty or untrustworthiness of data - Data must be meaningful to the problem being analyzed - Must keep data clean and implement processes to keep dirty data from accumulating in systems
A data steward is responsible for ensuring data policies and procedures are implemented across an organization
True
As businesses increase their reliance on enterprise systems such as CRM, they are rapidly accumulating vast amounts of data. Every interaction between departments or with the outside world, historical information on past transactions, as well as external market information, is entered into information systems for future use and access.
True
Big data depends on distributed computing environments and virtualization. Distributed computing allows processes to be run over multiple machines taking advantage of greater processing power Virtualization allows the cost of big data to decrease since multiple applications can run on one machine. It also helps reduce ewaste!
True
Data integrity issues can cause managers to consider the system reports invalid and make decisions based on other sources.
True
Data-driven websites are especially useful when the site offers a great deal of information, products, or services. website visitors are frequently angered if they are buried under an avalanche of information when searching a website. A data-driven website invites visitors to select and view what they are interested in by inserting a query, which the website then analyzes and custom builds a Web page in real-time that satisfies the query.
True
Information cleansing allows an organization to fix these types of inconsistencies and cleans the data in the data warehouse
True
Organizations capture and store transactional information in databases and use it when performing operational tasks and repetitive decisions such as - analyzing daily sales reports - production schedules
True
Poor information could cause a CRM system to send an expensive promotional item (such as a fruit basket) to the wrong address of one of its best customers
True
Poor information could cause the SCM system to order too much inventory from a supplier based on inaccurate orders
True
The ETL process gathers data from the internal and external databases and passes it to the data warehouse The ETL process also gathers data from the data warehouse and passes it to the data marts Each layer in a data warehouse or data mart represents information according to an additional dimension Dimensions could include such things as: Products Promotions Stores Category Region Stock price Date Time Weather
True
A Cube of Information for Performing a Multidimensional Analysis on Three Stores for Five Products and Four Promotions
Users can slice and dice the cube to drill down into the information Cube A represents store information (the layers), product information (the rows), and promotion information (the columns) Cube B represents a slice of information displaying promotion II for all products at all stores Cube C represents a slice of information displaying promotion III for product B at store 2
data broker
a business that collects personal information about consumers and sells that information to other organizations
Big Data
a collection of large, complex data sets, including structured and unstructured data, which cannot be analyzed using traditional database methods and tools and includes the following four common characteristics: - variety - veracity - volume - velocity
Record
a collection of related data elements
data lake
a storage repository that holds a vast amount of raw data in its original format until the business needs it
Data map
a technique for establishing a match, or balance, between the source data and the target data warehouse - This technique identifies data shortfalls and recognizes data issues. Data maps can also alert managers to inconsistencies or help determine the cause and effects of enterprise-wide business decisions.
Cluster Analysis
a technique used to divide an information set into mutually exclusive groups such that the members of each group are as close together as possible to one another and the different groups are as far apart as possible - Consumer goods by content, brand loyalty or similarity - Product market typology for tailoring sales strategies - Retail store layouts and sales performances - Corporate decision strategies using social preferences - Control, communication, and distribution of organizations - Industry processes, products, and materials - Design of assembly line control functions - Character recognition logic in OCR readers - Data base relationships in management information systems
Database Management System (DBMS)
allows users to create, read, update, and delete data in a relational database
data-driven decision management
an approach to business governance that values decisions that can be backed up with verifiable data - The success of the data-driven approach is reliant upon the quality of the data gathered and the effectiveness of its analysis and interpretation.
data point
an individual item on a graph or a chart
data-driven website
an interactive website kept constantly updated and relevant to the needs of its customers using a database Ex: Content creator Content editor Static information Dynamic information Dynamic catalog
Web Analysis
analyzes unstructured data associated with websites to identify consumer behavior and website navigation
List of the Different Types of Ad-Hoc decisions a business might base on Analytical Information
building a new plant, hiring or reducing workforces, introducing a new product
Data Dictionary
compiles all of the metadata about the data elements in the data model
data mart
contains a subset of data warehouse information
Data Visualization
describes technologies that allow users to see or visualize data to transform information into a business perspective
Metadata
details about data
Correlation analysis
determines a statistical relationship between variables, often for the purpose of identifying predictive factors among the variables
Variety
different forms of structured and unstructured data - data from spreadsheets and databases as well as from email, videos, photos, and PDFs, all of which must be analyzed
Dirty Data
erroneous or flawed data
Inconsistent Data Definitions
every department had its own method for recording data so when trying to share information, data did not match and users did not get the data they really needed
Real-time information
immediate, up-to-date information
Data Validation
includes the tests and evaluations used to determine compliance with data governance policies to ensure correctness of data
Information Levels
individual, department, enterprise •Individual knowledge, goals and strategies. •Departmental goals, revenues, expenses, processes, and strategies •Enterprise revenues, expenses, processes, and strategies
Data artist
is a business analytics specialist who uses visual tools to help people understand complex data.
Recommendation Engine
is a data mining algorithm that analyzes a customer's purchases and actions on a website and then uses the data to recommend complementary products.
Information Timeliness
is an aspect of information that depends on the situation * Real-time information * Real-time system
Data Profiling
is the process of collecting statistics and information about data in an existing source. Insights extracted from data profiling can determine how easy or difficult it will be to use existing data for other purposes along with providing metrics on data quality.
Data Replication
is the process of sharing information to ensure consistency between multiple data sources. Data mining can determine relationships among such internal factors as price, product positioning, or staff skills, and external factors such as economic indicators, competition, and customer demographics
Data Model
logical data structures that detail the relationships among data elements using graphics or pictures
Database
maintains information about various types of objects (inventory), events (transactions), people (employees), and places (warehouses)
lack of data standards
managers needed to perform cross-functional analysis using data from all departments, which differed in granularities, formats, and levels
Algorithms
mathematical formulas placed in software that performs an analysis on a data set - Anomaly detection - The process of identifying rare or unexpected items or events in a data set that do not conform to other items in the data set - Outliers - A data value that is numerically distant from most of the other data points in a set of data - Analysis paralysis - User goes into an emotional state of over-analysis (or over-thinking) a situation so that a decision or action is never taken, in effect paralyzing the outcome
Performance
measures how quickly a system performs a certain process or transaction
Information Cleansing Activities
missing records or attributes, redundant records, missing keys or other required data, erroneous relationships or references, inaccurate data
Ineffective Direct Data Access
most data stored in operational databases did not allow users direct access; users had to wait to have their queries or questions answered by MIS professionals who could code SQL
Information Integrity Issues
occur when a system produces incorrect, inconsistent, or duplicate data
Data Gap Analysis
occurs when a company examines its data to determine if it can meet business expectations, while identifying possible data gaps or where missing data might exist
information Inconsistency
occurs when the same data element has different values
Infographics
present the results of data analysis, displaying the patterns, relationships, and trends in a graphical format - Infographics can present the results of large data analysis looking for patterns and relationships that monitor changes in variables over time.
One of the primary goals of a database is to eliminate information redundancy by
recording each piece of information in only one place
Granularity
refers to the extent of detail within the information (fine and detailed or "coarse" and abstract information)
Data Governance
refers to the overall management of the availability, usability, integrity, and security of company data
relational integrity constraint
rules that enforce basic and fundamental information-based constraints
Addressing the above sources of information inaccuracies will
significantly improve the quality of organizational information
Fast data
the application of big data analytics to smaller data sets in near-real or real-time in order to solve a problem or create business value
Pattern recognition analysis
the classification or labeling of an identified pattern in the machine learning process
data aggregation
the collection of data from various sources for the purpose of data processing
Data stewardship
the management and oversight of an organization's data assets to help provide business users with high-quality data that is easily accessible in a consistent manner
Master Data Management (MDM)
the practice of gathering data and ensuring that it is uniform, accurate, consistent, and complete, including such entities as customers, suppliers, products, sales, employees, and other critical entities that are commonly integrated across organizational systems
Business Intelligence dasboards
track corporate metrics such as critical success factors and key performance indicators and include advanced capabilities such as interactive controls allowing users to manipulate data for analysis. The majority of business intelligence software vendors offer a number of different data visualization tools and business intelligence dashboards
Information is stored in databases
true
The billing system has "accounts payable" customer contact information
true
The customer service system has the "product user" customer contact information
true
the marketing and sales systems have "decision makes" customer information
true
inadequate data usefulness
users could not get the data they needed; what was collected was not always useful for intended purposes
Data mining tools
uses a variety of techniques to find patterns and relationships in large volumes of information that predict future behavior and guide decision making
Behavioral analysis
using data about people's behaviors to understand intent and predict future actions
Increased Information Integrity (Quality)
•Information integrity - measures the quality of information •Integrity constraint - rules that help ensure the quality of information -Relational integrity constraint -Business-critical integrity constraint