MISY CH 6: Data Business Intelligence

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Regression Model

* A statistical process for estimating the relationships among variables. * Regression models include many techniques for modeling and analyzing several variables when the focus is on the relationship between a dependent variable and one or more independent variables: - Predict the winners of a marathon based on gender, height, weight, hours of training. - Explain how the quantity of weekly sales of a popular brand of beer depend on its price at a small chain of supermarkets.

Forecasting Model

* Time-series information is time-stamped information collected at a particular frequency. * Forecasts are predictions based on time-series information, allowing users to manipulate the time series for forecasting activities. - Web visits per hour - Sales per month - Customer service calls per day

Information Type

* Transactional (operational) * Analytical (managerial)

Why you would want to define access level security?

- Access levels will typically mimic the hierarchical structure of the organization and protect organizational information from being viewed and manipulated by individuals who should not have access to the sensitive or confidential information - Low level employees typically have the lowest levels of access - High level employees typically have access to all types of database information For example: You would not want analysts viewing all salary information for the entire company - in general: * Analysts can usually only view their own salary * Managers have higher access and can view the salaries of all their team members, but cannot view other managers' salaries * Directors can view all of their managers' and analysts' salaries, but not other directors' salaries * The CFO and CEO can view every employee's salary

Contact Information in Operational Systems

- Billing - Customer Service - Marketing - Sales

Data mining uncovers patterns and trends such as:

- Building budgets and other financial information - Detecting fraud by identifying deceptive spending patterns - Finding the best customers who spend the most money - Keeping customers from leaving or migrating to competitors - Promoting and hiring employees to ensure success

define the relationship between a data point, data broker, and data lake

- Data points are two pieces of raw data that have an intersection or correlation - Data points are in a data lake - A data broker collects data in a data lake

Supporting Decisions with Business Intelligence

- Data warehouses extend the transformation of data into information - In the 1990's executives became less concerned with the day-to-day business operations and more concerned with overall business functions - The data warehouse provided the ability to support decision making without disrupting the day-to-day operations

Reduced Information Redundancy

- Databases reduce information redundancy - Inconsistency is one of the primary problems with redundant information

Additional Data Driven Website Advantages

- Development: Allows the website owner to make changes any time—all without having to rely on a developer or knowing HTML programming. A well-structured, data-driven website enables updating with little or no training. Content management: A static website requires a programmer to make updates. This adds an unnecessary layer between the business and its Web content, which can lead to misunderstandings and slow turnarounds for desired changes. - Future expandability: Having a data-driven website enables the site to grow faster than would be possible with a static site. Changing the layout, displays, and functionality of the site (adding more features and sections) is easier with a data-driven solution. - Minimizing human error: Even the most competent programmer charged with the task of maintaining many pages will overlook things and make mistakes. This will lead to bugs and inconsistencies that can be time consuming and expensive to track down and fix. Unfortunately, users who come across these bugs will likely become irritated and may leave the site. A well-designed, data-driven website will have "error trapping" mechanisms to ensure that required information is filled out correctly and that content is entered and displayed in its correct format. Cutting production and update costs: A data-driven website can be updated and "published" by any competent data entry or administrative person. In addition to being convenient and more affordable, changes and updates will take a fraction of the time that they would with a static site. While training a competent programmer can take months or even years, training a data entry person can be done in 30 to 60 minutes. - More efficient: By their very nature, computers are excellent at keeping volumes of information intact. With a data-driven solution, the system keeps track of the templates, so users do not have to. Global changes to layout, navigation, or site structure would need to be programmed only once, in one place, and the site itself will take care of propagating those changes to the appropriate pages and areas. A data-driven infrastructure will improve the reliability and stability of a website, while greatly reducing the chance of "breaking" some part of the site when adding new areas. - Improved Stability: Any programmer who has to update a website from "static" templates must be very organized to keep track of all the source files. If a programmer leaves unexpectedly, it could involve re-creating existing work if those source files cannot be found. Plus, if there were any changes to the templates, the new programmer must be careful to use only the latest version. With a data-driven website, there is peace of mind, knowing the content is never lost, even if your programmer is.

The two primary computing models that have shaped the collection of big data include:

- Distributed computing - Processes and manages algorithms across many machines in a computing environment - Virtualization - The creation of a virtual (rather than actual) version of computing resources, such as an operating system, a server, a storage device, or network resources

The Solution; Business Intelligence

- Improving the quality of business decisions has a direct impact on costs and revenue - BI enables business users to receive data for analysis that is: Reliable Consistent Understandable Easily manipulated The result creates an agile intelligent enterprise.

Potential business effects resulting from low quality information include:

- Inability to accurately track customers - Difficulty identifying valuable customers - Inability to identify selling opportunities - Marketing to nonexistent customers - Difficulty tracking revenue Inability to build strong customer relationships

Reasons Business Analysis Is Difficult from Operational Databases

- Inconsistent Data Definitions - Lack of Data Standards - Poor Data Quality - Inadequate Data Usefulness - Ineffective Direct Data Access

Business Advantages of a Relational Database

- Increased flexibility - Increased information integrity - Increased scalability and performance - Increased information security - Reduced information redundance

You never want to find yourself using technology to help you make a bad decision faster

- Information inconsistency - Information integrity issues

Prediction modeling techniques include:

- Optimization modeling - Forecasting modeling - Regression modeling

Business Intelligence

- Organizational data is difficult to access - Organizational data contains structured data in database - Organizational data contains unstructured data such as voice mail, phone calls, text messages, and video clips

Can you define two business-critical integrity constraints for an ordering system?

- Product returns are not accepted for fresh product 15 days after purchase - A discount maximum of 20 percent

If they had to choose a percentage for acceptable information what would it be and why?

- Some companies are willing to go as low as 20% complete just to find business intelligence - Few organizations will go below 50% accurate, he information is useless if it is not accurate

What kinds of databases can be found around your college?

- Student registration - Course evaluation - Payroll - Parking services

Overview of a data-driven website

- The customer enters search criteria in the website - The database runs a query on the search criteria

Achieving perfect information is almost impossible

- The more complete and accurate an organization wants to get its information, the more it costs - The tradeoff between perfect information lies in accuracy verses completeness - Accurate information means it is correct, while complete information means there are no blanks - Most organizations determine a percentage high enough to make good decisions at a reasonable cost, such as 85% accurate and 65% complete

What is the primary difference between a database and data warehouse?

- The primary difference between a database and a data warehouse is that a database stores information for a single application, whereas a data warehouse stores information from multiple databases, or multiple applications, and external information such as industry information - This enables cross-functional analysis, industry analysis, market analysis, etc., all from a single repository - Data warehouses support only analytical processing (OLAP)

what would happen to a website that was not data-driven?

- The users would need to continually update the website data manually as the business data is updated. - This would be a redundant effort and most likely result in errors and the website could quickly become out of sync with the business data

Can you define two relational integrity constraints for an ordering system?

- Users cannot create an order for a nonexistent customer - An order cannot be shipped without an address

Dirty Data Problems

-Duplicate data -Misleading data -Incorrect data -Non-formatted data -Violates business rules data -Non-integrated data -Inaccurate data

Benefits of Good Information

-High quality information can significantly improve the chances of making a good decision -Good decisions can directly impact an organization's bottom line

A good way to explain databases is to compare them to spreadsheets What are the limitations when using a spreadsheet?

-Limited number of rows and columns (Excel: 65,536 rows by 256 columns) Once you use more than 65,536 rows you have outgrown your spreadsheet - Only one users can access the spreadsheet - Users can view all information in the spreadsheet - Users can change all information in the spreadsheet

Relationship of a database with a DBMS

-The user interacts directly with the DBMS -The DBMS obtains the information from the database

Characteristics of High-Quality Information

1. Accurate 2. Complete 3. Consistent 4. Unique 5. Timely

The Four Primary Sources of Low Quality Information (Costs of Using LQI)

1. Customers intentionally enter inaccurate information to protect their privacy 2. Different entry standards and formats 3. Operators enter abbreviated or erroneous information by accident or to save time 4. Third party and external information contains inconsistencies, inaccuracies, and errors (Additional: A customer service representative could accidentally transpose a number in an address or misspell a last name)

Data-driven website advantages

1. Easy to manage content 2. Easy to store large amounts of data 3. Easy to eliminate human errors

The Four Primary Traits of the Value of Information

1. Type 2. Timeliness 3. Quality 4. Governance

The Business benefits of High-Quality Information

1. information is everywhere in an organization 2. employees must be able to obtain and analyze the many different levels, formats, and granularities of organizational information to make decisions 3. successfully collecting, compiling, sorting and analyzing information can provide tremendous insight into how an organization is performing

Competitive Monitoring

A company keeps tabs of its competitor's activities on the web using software that automatically tracks all competitor website activities such as discounts and new products.

distinguish between a data warehouse and a data mart?

A data warehouse has an enterprisewide organizational focus, while a data mart focuses on a subset of information for a given business unit such as finance

Increased Scalability and Performance

A database must scale to meet increased demand, while maintaining acceptable performance levels

Primary Keys

A field (or group of fields) that uniquely identifies a given entity in a table

data warehouse

A logical collection of information - gathered from many different operational databases - that supports business analysis activities and decision-making tasks The primary purpose of a data warehouse is to aggregate information throughout an organization into a single repository for decision-making purposes

Entity

A person, place, thing, transaction, or event about which information is stored - The rows in a table contain entities

Foreign keys

A primary key of one table that appears as an attribute in another table and acts to provide a logical relationship among the two tables

Extraction, transformation, and loading (ETL)

A process that extracts information from internal and external databases, transforms the information using a common set of enterprise definitions, and loads the information into a data warehouse

information cleansing/scrubbing

A process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information

Attributes (fields, columns)

A quality or feature regarded as a characteristic or inherent part of someone or something. The data elements associated with an entity - The columns in each table contain the attributes

Prediction

A statement about what will happen or might happen in the future; for example, predicting future sales or employee turnover

Optimization Model

A statistical process that finds the way to make a design, system, or decision as effective as possible; for example: - Finding the values of controllable variables that determine maximal productivity or minimal waste - Determining which products to produce given a limited amount of ingredients - Choosing a combination of projects to maximize overall earnings.

Increased Flexibility

A well-designed database should: - Handle changes quickly and easily - Provide users with different views - Have only one physical view - Have multiple logical views

Information cleansing or scrubbing

An organization must maintain high-quality data in the data warehouse - dirty data

Social media analysis

Analyzes text flowing across the Internet, including unstructured text from blogs and messages

Text Analysis

Analyzes unstructured data to find trends and patterns in words and sentences. Text mining a firm's customer support email might identify which customer service representative is best able to handle the question, allowing the system to forward it to the right person.

Accuracy

Are all the values correct? For example, is the name spelled correctly? Is the dollar amount recorded properly?

Completeness

Are any of the values missing? For example, is the address complete including street, city, state, and zip code?

Classification

Assigns record to one of a predefined set of classes

Data Scientist perform big data analytics using

Behavioral analysis Correlation analysis Exploratory data analysis Pattern recognition analysis Social media analysis Speech analysis Text analysis Web analysis

Information Quality

Business decisions are only as good as the quality of the information used to make the decisions

What does the student view display when a student accesses the school's student database?

Courses enrolled Grades Tuition Credits for graduation

Have only one physical view (increased flexibility)

Deals with the physical storage of information on a storage device

Data Visualization

Describes technologies that allow users to "see" or visualize data to transform information into a business perspective

Information Granularities

Detail (Fine), Summary, Aggregate (Coarse) -reports for each salesperson, product, and part -reports for all sales personnel, all products, and all parts -reports across departments, organizations, and companies

Estimates Analysis

Determines values for an unknown continuous variable behavior or estimated future value

Transactional Information

Encompasses all of the information contained within a single business process or unit of work, and its primary purpose is to support the performing of daily operational tasks Ex: withdrawing cash from an ATM, making an airline reservation, purchasing stocks * Compile a list of additional transactional information examples These could include daily sales, hourly employee payroll, product orders, shipping an order

Analytical Information

Encompasses all organizational information, and its primary purpose is to support the performing of managerial analysis tasks (includes transactional information) * Also includes external organizational information such as market, industry, and economic conditions * is used to make ad-hoc decisions Ex: trends, sales, product statistics, and future growth projections * Compile a list of additional analytical information examples These could include: cost/benefit analysis, sales forecast, market trends, industry trends, and regulations

Business-critical integrity constraint

Enforces business rules vital to an organization's success and often requires more insight and knowledge than relational integrity constraints.

Storing Data Elements in

Entities and Attributes

What is the cost to the business of sending multiple identical marketing materials to the same customers?

Expense Risk of alienating customers

Have multiple logical views (increased flexibility)

Focuses on how individual users logically access information to meet their own particular business needs

Insurance:

Forecasting claim amounts and medical coverage costs; classifying the most important elements that affect medical coverage; predicting which customers will buy new insurance policies.

Banking:

Forecasting levels of bad loans and fraudulent credit card use, credit card spending by new customers, and which kinds of customers will best respond to (and qualify for) new loan offers_

What kinds of additional entities might be found in this database?

INVENTORY, MARKETING CAMPAIGN, SALES PRICE, CUSTOMER INFORMATION, PAYMENT INFORMATION

Exploratory Data Analysis (EDA)

Identifies patterns in data, including outliers, uncovering the underlying structure to understand relationships between the variables.

What occurs when you have the inability to build strong customer relationships?

Increase Buyer Power

Increased Information Security

Information is an organizational asset and must be protected

Consistency

Is aggregate or summary information in agreement with detailed information? For example, do all total fields equal the true total of the individual fields?

Uniqueness

Is each transaction, entity, and event represented only once in the information? For example, are there any duplicate customers?

Timeliness

Is the information current with respect to the business requirements? For example, is information updated weekly, daily, or hourly?

The Problem: Data Rich, Information Poor

Many organizations find themselves in the position of being data rich and information poor Even in today's electronic world, managers struggle with the challenge of turning their business data into business intelligence

Low Quality Information Examples

Missing Information: Without a first name it would be impossible to correlate this customer with customers in other databases (Sales, Marketing, Billing, Customer Service) to gain a compete customer view (CRM) Incomplete Information: Without a complete street address there is no possible way to communicate with this customer via mail or deliveries. An order might be sitting in a warehouse waiting for the complete address before shipping. The company has spent time and money processing an order that might never be completed Probable Duplicate Information: If this is the same customer, the company will waste money sending out two sets of promotions and advertisements to the same customers. It might also send two identical orders and have to incur the expense of one order being returned Potential Wrong Information: There are many times when a phone and a fax have the same number. Since the phone number is also in the e-mail address field, chances are that the number is inaccurate Inaccurate Information: The business would have no way of communicating with this customer via e-mail Incomplete Information: The company could determine the area code based on the customer's address. All incorrect information needs to be fixed, which costs time and money

Data Visualization tools

Moves beyond Excel graphs and charts into sophisticated analysis techniques such as pie charts, controls, instruments, maps, time-series graphs, etc. - can help uncover correlations and trends in data that would otherwise go unrecognized.

What kinds of additional attributes might be found in the MUSICIAN table?

MusicianBand MusicianEducation MusicianAge MusicianGender

Information is everywhere in a

Organiation

Databases offer several security features

Password: Provides authentication of the user Access level: Determines who has access to the different types of information Access control: Determines types of user access, such as read-only access

Operations management:

Predicting machinery failures; finding key factors that control optimization of manufacturing capacity.

Retail and sales:

Predicting sales; determining correct inventory levels and distribution schedules among outlets; and loss prevention.

Brokerage and securities trading:

Predicting when bond prices will change; forecasting the range of stock fluctuations for particular issues and the overall market; determining when to buy or sell stocks.

Creating Relationships Through Keys

Primary keys and foreign keys identify the various entities (tables) in the database

Real-time system

Provides real-time information in response to requests

Scalability

Refers to how well a system can adapt to increased demands

Affinity Grouping Analysis

Reveals the relationship between variable along with the nature and frequency of the relationships.

Structured Data Unstructured Data Examples

Structured Data - Sensor data - Weblog data - Financial data - Click-stream data - Point of sale data - Accounting data Unstructured Data - Satellite images - Photographic data - Video data - Social Media data - Text message - Voice mail data

What would happen if a new database called "RealData" hit the market and allowed only one logical view?

The "RealData" database simply would never sell. With only one logical view every person in an entire organization would have the same view

Why is the ability to look at information based on different dimensions critical to a businesses success?

The ability to look at information from different dimensions can add tremendous business insight - By slicing-and-dicing the information a business can uncover great unexpected insights

Information governance (IG)

The accountability framework and decision rights to achieve enterprise information management (EIM). IG is the responsibility of executive leadership for developing and driving the IG strategy throughout the organization. IG encompasses both data governance (DG) and information technology governance (ITG)

Velocity

The analysis of streaming data as it travels around the Internet - Analysis necessary of social media messages spreading globally,

BI in a Data-Driven Website

The customer enters search criteria in the website The database runs a query on the search criteria The company can gain BI by viewing how often items are searched, which item is searched the most, the least, etc. Companies can gain business intelligence by viewing the data accessed and analyzed from their website. The figure displays how running queries or using analytical tools, such as a Pivot Table, on the database that is attached to the website can offer insight into the business, such as items browsed, frequent requests, items bought together, etc.

Data Warehouse Model

The data warehouse modeled in the above figure compiles information from internal databases or transactional/operational databases and external databases through ETL It then send subsets of information to the data marts through the ETL process

poor data quality

The data, if available, were often incorrect or incomplete. Therefore, users could not rely on the data to make decisions

Information Formats

The different ways in which information can be presented using world wide web (www) technologies. Such as document, presentation, spreadsheet, database Examples are: Letters, memos, faxes, emails, reports, marketing materials, and training materials Product, strategy, process, financial, customer, and competitor Sales, marketing, industry, financial, competitor, customer, and order spreadsheets Customer, employee, sales, order, supplier, and manufacturer databases

information redundancy

The duplication of data or storing the same information in multiple places

How BI Can Answer Tough Customer Questions

The figure displays how organizations using BI can find the root causes to problems and provide solutions simply by asking "Why?" The process is initiated by analyzing a global report, say of sales per quarter. Every answer is followed by a new question, and users can drill deep down into a report to get to fundamental causes. Once they have a clear understanding of root causes, they can take highly effective action. Finding the answers to tough business questions by using data that is reliable, consistent, understandable, and easily manipulated allows a business to gain valuable insight into such things as: - Where the business has been. Historical perspective is always important in determining trends and patterns of behavior. - Where it is now. Current situations are critical to either modify if not acceptable or encourage if they are trending in the right direction. - And where it will be in the near future. Being able to predict with surety the direction of the company is critical to sound planning and to creating sound business strategies.

Data Mining

The process of analyzing data to extract information not offered by the raw data alone The three elements of data mining include: 1. Data - Foundation for data-directed decision making 2. Discovery - Process of identifying new patterns, trends, and insights 3. Deployment - Process of implementing discoveries to drive success

Speech analysis

The process of analyzing recorded calls to gather information; brings structure to customer interactions and exposes information buried in customer contact center interactions with an enterprise. Speech analysis is heavily used in the customer service department to help improve processes by identifying angry customers and routing them to the appropriate customer service representative

Data Analysis

The process of compiling, analyzing, and interpreting the results of primary and secondary data collection.

Volume

The scale of data - Includes enormous volumes of data generated daily - Massive volume created by machines and networks - Big data tools necessary to analyze zettabytes and brontobytes

Data Element

The smallest or basic unit of information

Veracity

The uncertainty of data, including biases, noise, and abnormalities - Uncertainty or untrustworthiness of data - Data must be meaningful to the problem being analyzed - Must keep data clean and implement processes to keep dirty data from accumulating in systems

A data steward is responsible for ensuring data policies and procedures are implemented across an organization

True

As businesses increase their reliance on enterprise systems such as CRM, they are rapidly accumulating vast amounts of data. Every interaction between departments or with the outside world, historical information on past transactions, as well as external market information, is entered into information systems for future use and access.

True

Big data depends on distributed computing environments and virtualization. Distributed computing allows processes to be run over multiple machines taking advantage of greater processing power Virtualization allows the cost of big data to decrease since multiple applications can run on one machine. It also helps reduce ewaste!

True

Data integrity issues can cause managers to consider the system reports invalid and make decisions based on other sources.

True

Data-driven websites are especially useful when the site offers a great deal of information, products, or services. website visitors are frequently angered if they are buried under an avalanche of information when searching a website. A data-driven website invites visitors to select and view what they are interested in by inserting a query, which the website then analyzes and custom builds a Web page in real-time that satisfies the query.

True

Information cleansing allows an organization to fix these types of inconsistencies and cleans the data in the data warehouse

True

Organizations capture and store transactional information in databases and use it when performing operational tasks and repetitive decisions such as - analyzing daily sales reports - production schedules

True

Poor information could cause a CRM system to send an expensive promotional item (such as a fruit basket) to the wrong address of one of its best customers

True

Poor information could cause the SCM system to order too much inventory from a supplier based on inaccurate orders

True

The ETL process gathers data from the internal and external databases and passes it to the data warehouse The ETL process also gathers data from the data warehouse and passes it to the data marts Each layer in a data warehouse or data mart represents information according to an additional dimension Dimensions could include such things as: Products Promotions Stores Category Region Stock price Date Time Weather

True

A Cube of Information for Performing a Multidimensional Analysis on Three Stores for Five Products and Four Promotions

Users can slice and dice the cube to drill down into the information Cube A represents store information (the layers), product information (the rows), and promotion information (the columns) Cube B represents a slice of information displaying promotion II for all products at all stores Cube C represents a slice of information displaying promotion III for product B at store 2

data broker

a business that collects personal information about consumers and sells that information to other organizations

Big Data

a collection of large, complex data sets, including structured and unstructured data, which cannot be analyzed using traditional database methods and tools and includes the following four common characteristics: - variety - veracity - volume - velocity

Record

a collection of related data elements

data lake

a storage repository that holds a vast amount of raw data in its original format until the business needs it

Data map

a technique for establishing a match, or balance, between the source data and the target data warehouse - This technique identifies data shortfalls and recognizes data issues. Data maps can also alert managers to inconsistencies or help determine the cause and effects of enterprise-wide business decisions.

Cluster Analysis

a technique used to divide an information set into mutually exclusive groups such that the members of each group are as close together as possible to one another and the different groups are as far apart as possible - Consumer goods by content, brand loyalty or similarity - Product market typology for tailoring sales strategies - Retail store layouts and sales performances - Corporate decision strategies using social preferences - Control, communication, and distribution of organizations - Industry processes, products, and materials - Design of assembly line control functions - Character recognition logic in OCR readers - Data base relationships in management information systems

Database Management System (DBMS)

allows users to create, read, update, and delete data in a relational database

data-driven decision management

an approach to business governance that values decisions that can be backed up with verifiable data - The success of the data-driven approach is reliant upon the quality of the data gathered and the effectiveness of its analysis and interpretation.

data point

an individual item on a graph or a chart

data-driven website

an interactive website kept constantly updated and relevant to the needs of its customers using a database Ex: Content creator Content editor Static information Dynamic information Dynamic catalog

Web Analysis

analyzes unstructured data associated with websites to identify consumer behavior and website navigation

List of the Different Types of Ad-Hoc decisions a business might base on Analytical Information

building a new plant, hiring or reducing workforces, introducing a new product

Data Dictionary

compiles all of the metadata about the data elements in the data model

data mart

contains a subset of data warehouse information

Data Visualization

describes technologies that allow users to see or visualize data to transform information into a business perspective

Metadata

details about data

Correlation analysis

determines a statistical relationship between variables, often for the purpose of identifying predictive factors among the variables

Variety

different forms of structured and unstructured data - data from spreadsheets and databases as well as from email, videos, photos, and PDFs, all of which must be analyzed

Dirty Data

erroneous or flawed data

Inconsistent Data Definitions

every department had its own method for recording data so when trying to share information, data did not match and users did not get the data they really needed

Real-time information

immediate, up-to-date information

Data Validation

includes the tests and evaluations used to determine compliance with data governance policies to ensure correctness of data

Information Levels

individual, department, enterprise •Individual knowledge, goals and strategies. •Departmental goals, revenues, expenses, processes, and strategies •Enterprise revenues, expenses, processes, and strategies

Data artist

is a business analytics specialist who uses visual tools to help people understand complex data.

Recommendation Engine

is a data mining algorithm that analyzes a customer's purchases and actions on a website and then uses the data to recommend complementary products.

Information Timeliness

is an aspect of information that depends on the situation * Real-time information * Real-time system

Data Profiling

is the process of collecting statistics and information about data in an existing source. Insights extracted from data profiling can determine how easy or difficult it will be to use existing data for other purposes along with providing metrics on data quality.

Data Replication

is the process of sharing information to ensure consistency between multiple data sources. Data mining can determine relationships among such internal factors as price, product positioning, or staff skills, and external factors such as economic indicators, competition, and customer demographics

Data Model

logical data structures that detail the relationships among data elements using graphics or pictures

Database

maintains information about various types of objects (inventory), events (transactions), people (employees), and places (warehouses)

lack of data standards

managers needed to perform cross-functional analysis using data from all departments, which differed in granularities, formats, and levels

Algorithms

mathematical formulas placed in software that performs an analysis on a data set - Anomaly detection - The process of identifying rare or unexpected items or events in a data set that do not conform to other items in the data set - Outliers - A data value that is numerically distant from most of the other data points in a set of data - Analysis paralysis - User goes into an emotional state of over-analysis (or over-thinking) a situation so that a decision or action is never taken, in effect paralyzing the outcome

Performance

measures how quickly a system performs a certain process or transaction

Information Cleansing Activities

missing records or attributes, redundant records, missing keys or other required data, erroneous relationships or references, inaccurate data

Ineffective Direct Data Access

most data stored in operational databases did not allow users direct access; users had to wait to have their queries or questions answered by MIS professionals who could code SQL

Information Integrity Issues

occur when a system produces incorrect, inconsistent, or duplicate data

Data Gap Analysis

occurs when a company examines its data to determine if it can meet business expectations, while identifying possible data gaps or where missing data might exist

information Inconsistency

occurs when the same data element has different values

Infographics

present the results of data analysis, displaying the patterns, relationships, and trends in a graphical format - Infographics can present the results of large data analysis looking for patterns and relationships that monitor changes in variables over time.

One of the primary goals of a database is to eliminate information redundancy by

recording each piece of information in only one place

Granularity

refers to the extent of detail within the information (fine and detailed or "coarse" and abstract information)

Data Governance

refers to the overall management of the availability, usability, integrity, and security of company data

relational integrity constraint

rules that enforce basic and fundamental information-based constraints

Addressing the above sources of information inaccuracies will

significantly improve the quality of organizational information

Fast data

the application of big data analytics to smaller data sets in near-real or real-time in order to solve a problem or create business value

Pattern recognition analysis

the classification or labeling of an identified pattern in the machine learning process

data aggregation

the collection of data from various sources for the purpose of data processing

Data stewardship

the management and oversight of an organization's data assets to help provide business users with high-quality data that is easily accessible in a consistent manner

Master Data Management (MDM)

the practice of gathering data and ensuring that it is uniform, accurate, consistent, and complete, including such entities as customers, suppliers, products, sales, employees, and other critical entities that are commonly integrated across organizational systems

Business Intelligence dasboards

track corporate metrics such as critical success factors and key performance indicators and include advanced capabilities such as interactive controls allowing users to manipulate data for analysis. The majority of business intelligence software vendors offer a number of different data visualization tools and business intelligence dashboards

Information is stored in databases

true

The billing system has "accounts payable" customer contact information

true

The customer service system has the "product user" customer contact information

true

the marketing and sales systems have "decision makes" customer information

true

inadequate data usefulness

users could not get the data they needed; what was collected was not always useful for intended purposes

Data mining tools

uses a variety of techniques to find patterns and relationships in large volumes of information that predict future behavior and guide decision making

Behavioral analysis

using data about people's behaviors to understand intent and predict future actions

Increased Information Integrity (Quality)

•Information integrity - measures the quality of information •Integrity constraint - rules that help ensure the quality of information -Relational integrity constraint -Business-critical integrity constraint


Ensembles d'études connexes

Chapter 48: Neurons, Synapses, & Signaling (Mastering Biology & Dynamic Study Module)

View Set

Limited Partnerships and Limited Liability Companies

View Set

developmental psychology chapter 12

View Set

History Possible Short Answer Questions

View Set

Chapter 3-ENTR-202: Small Business Environment: Managing External Relations

View Set

Sexual Assault Prevention for Undergraduates

View Set

Human Resource Management (Chapter 2)

View Set