Ch 6

¡Supera tus tareas y exámenes ahora con Quizwiz!

attributes

(also called columns or fields) are the data elements associated with an entity ex. MUSICIAN ID, MUSICIAN NAME

entity

(also referred to as a table) stores information about a person, place, thing, transaction, or event. ex. MUSICIANS

Reasons Business Analysis Is Difficult from Operational Databases

- inconsistent data definitions - lack of data standards - poor data quality - inadequate data usefulness - ineffective direct data access

4 common characteristics of big data

-variety -veracity - volume - velocity

Five Common Characteristics of High-Quality Information

1) accurate 2) complete 3) consistent 4) timely 5) unique

Data Mining Process Model Activities (not highlighted)

1. business understanding- Gain a clear understanding of the business problem that must be solved and how it impacts the company. 2. data understanding - Analyze all current data along with identifying any data quality issues 3. data preparation - Gather and organize the data in the correct formats and structures for analysis. 4. data modeling - Gather and organize the data in the correct formats and structures for analysis. 5. evaluation - Apply mathematical techniques to identify trends and patterns in the data. 6. deployment - Analyze the trends and patterns to assess the potential for solving the business problem.

Advanced Data Analytics (not highlighted)

Behavioral analysis Using data about people's behaviors to understand intent and predict future actions. Correlation analysis Determines a statistical relationship between variables, often for the purpose of identifying predictive factors among the variables. Exploratory data analysis Identifies patterns in data, including outliers, uncovering the underlying structure to understand relationships between the variables. Pattern recognition analysis The classification or labeling of an identified pattern in the machine learning process. Social media analysis Analyzes text flowing across the Internet, including unstructured text from blogs and messages. Speech analysis The process of analyzing recorded calls to gather information; brings structure to customer interactions and exposes information buried in customer contact center interactions with an enterprise. Speech analysis is heavily used in the customer service department to help improve processes by identifying angry customers and routing them to the appropriate customer service representative. Text analysis Analyzes unstructured data to find trends and patterns in words and sentences. Text mining a firm's customer support email might identify which customer service representative is best able to handle the question, allowing the system to forward it to the right person. Web analysis

Data Mining Techniques

Estimation - determines values for an unknown continuous variable behavior or estimated future value Affinity grouping - reveals the relationship between variables along with the nature and frequency of the relationships **Cluster analysis - is a technique used to divide an information set into mutually exclusive groups such that the members of each group are as close together as possible to one another and the different groups are as far apart as possible. Cluster analysis identifies similarities and differences among data sets, allowing similar data sets to be clustered together. A customer database includes attributes such as name and address, demographic information such as gender and age, and financial attributes such as income and revenue spent. Classification - process of organizing data into categories or groups for its most effective and efficient use

four primary reasons for low-quality information are:

Online customers intentionally enter inaccurate information to protect their privacy. Different systems have different information entry standards and formats. Data-entry personnel enter abbreviated information to save time or erroneous information by accident. Third-party and external information contains inconsistencies, inaccuracies, and errors

Structured and Unstructured Data Examples

Structured -sensor data - weblog data - financial data - clickstream data Unstructured - satellite images - photographic data - video data - text messages

Here are a few examples of how managers can use BI to answer tough business questions:

Where has the business been? Historical perspective offers important variables for determining trends and patterns. Where is the business now? Looking at the current business situation allows managers to take effective action to solve issues before they grow out of control. Where is the business going? Setting strategic direction is critical for planning and creating solid business strategies.

data artist

a business analytics specialist who uses visual tools to help people understand complex data

repository

a central location in which data is stored and managed

recommendation engine

a data mining algorithm that analyzes a customer's purchases and actions on a website and then uses the data to recommend complementary products. Netflix uses a recommendation engine to analyze each customer's film-viewing habits to provide recommendations for other customers with Cinematch, its movie recommendation system

primary key

a field (or group of fields) that uniquely identifies a given record in a table. In the table RECORDINGS, the primary key is the field RecordingID that uniquely identifies each record in the table.

data warehouse

a logical collection of information—gathered from many different operational databases—that supports business analysis activities and decision-making tasks Data warehouses go even a step further by standardizing information. Gender, for instance can be referred to in many ways (Male, Female, M/F, 1/0), but it should be standardized on a data warehouse with one common way of referring to each data element that stores gender (M/F). Standardization of data elements allows for greater accuracy, completeness, and consistency and increases the quality of the information in making strategic business decisions

information cleansing or scrubbing

a process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information.

data lake

a storage repository that holds a vast amount of raw data in its original format until the business needs it. While a traditional data warehouse stores data in files or folders, a data lake uses a flat architecture to store data

data driven decision management

an approach to business governance that values decisions that can be backed up with verifiable data. The success of the data-driven approach is reliant upon the quality of the data gathered and the effectiveness of its analysis and interpretation.

Dynamic website information is stored in a dynamic catalog

an area of a website that stores information about products in a database.

relational integrity constraints

are rules that enforce basic and fundamental information-based constraints

integrity constraints

are rules that help ensure the quality of information. The database design needs to consider integrity constraints. types of integrity constraints: (1) relational and (2) business critical.

comparative analysis

can compare two or more data sets to identify patterns and trends. Employees can base their decisions on data sets, experience, or knowledge and, preferably a combination of all three

data dictionary

compiles all of the metadata about the data elements in the data model. Looking at a data model along with reviewing the data dictionary provides tremendous insight into the database's functions, purpose, and business rules.

data mart

contains a subset of data warehouse information

content creator and content editor

content creator - is the person responsible for creating the original website content. content editor - the person responsible for updating and maintaining website content.

database management system (DBMS)

creates, reads, updates, and deletes data in a database while controlling access and security. ' Managers send requests to the DBMS, and the DBMS performs the actual manipulation of the data in the database.

data warehousing components

data mart information cleansing business intelligence

business rule

defines how a company performs certain aspects of its business and typically results in either a yes/no or true/false answer. Stating that merchandise returns are allowed within 10 days of purchase is an example of a business rule.

data vizualization

describes technologies that allow users to see or visualize data to transform information into a business perspective.

business-critical integrity constraints

enforce business rules vital to an organization's success and often require more insight and knowledge than relational integrity constraints

DBMS use three primary data models for organizing information: hierarchical, network, and the relational database, the most prevalent

hierarchical, network and relational database (most prevalent)

source data

identifies the primary location where data is collected. Source data can include invoices, spreadsheets, time sheets, transactions, and electronic sources such as other databases. Managers send their information requests to the MIS department, where a dedicated person compiles the various reports. In some situations, responses can take days, by which time the information may be outdated and opportunities lost. Many organizations find themselves in the position of being data rich and information poor. Even in today's electronic world, managers struggle with the challenge of turning their business data into business intelligence.

data validation

includes the tests and evaluations used to determine compliance with data governance polices to ensure correctness of data. Data validation helps to ensure that every data value is correct and accurate

data broker

is a business that collects personal information about consumers and sells that information to other organizations.

record

is a collection of related data elements ex. (3, Lady Gaga, gag.tiff, Do not bring young kids to live shows")

foreign key

is a primary key of one table that appears as an attribute in another table and acts to provide a logical relationship between the two tables. For instance, Black Eyed Peas in Figure 6.7 is one of the musicians appearing in the MUSICIANS table. Its primary key, MusicianID, is "2." Notice that MusicianID also appears as an attribute in the RECORDINGS table. By matching these attributes, you create a relationship between the MUSICIANS and RECORDINGS tables that states the Black Eyed Peas (MusicianID 2) have several recordings, including The E.N.D., Monkey Business, and Elepunk. In essence, MusicianID in the RECORDINGS table creates a logical relationship (who was the musician that made the recording) to the MUSICIANS table. Creating the logical relationship between the tables allows managers to search the data and turn it into useful information.

data map

is a technique for establishing a match, or balance, between the source data and the target data warehouse. This technique identifies data shortfalls and recognizes data issues. Data maps can also alert managers to inconsistencies or help determine the cause and effects of enterprise-wide business decisions.

data point

is an individual item on a graph or a chart. Organizational data includes far more than simple structured data elements in a database; the set of data also includes unstructured data such as voice mail, customer phone calls, text messages, and video clips, along with numerous new forms of data, such as tweets from Twitter

data-driven website

is an interactive website kept constantly updated and relevant to the needs of its customers using a database

data set

is an organized collection of data.

dirty data

is erroneous or flawed data

data steward

is responsible for ensuring the policies and procedures are implemented across the organization and acts as a liaison between the MIS department and the business

information redundancy

is the duplication of data, or the storage of the same data in multiple places

data stewardship

is the management and oversight of an organization's data assets to help provide business users with high-quality data that is easily accessible in a consistent manner.

master data management (MDM)

is the practice of gathering data and ensuring that it is uniform, accurate, consistent, and complete, including such entities as customers, suppliers, products, sales, employees, and other critical entities that are commonly integrated across organizational systems

data latency

is the time it takes for data to be stored or retrieved.

data models

logical data structures that detail the relationships among data elements by using graphics or pictures.

database

maintains information about various types of objects (inventory), events (transactions), people (employees), and places (warehouses)

real-time information

means immediate, up-to-date information.

data vizualiatoin tools

move beyond Excel graphs and charts into sophisticated analysis techniques such as controls, instruments, maps, time-series graphs, and more

information integrity systems

occur when a system produces incorrect, inconsistent, or duplicate data. Data integrity issues can cause managers to consider the system reports invalid and make decisions based on other sources.

data gap analysis

occurs when a company examines its data to determine if it can meet business expectations, while identifying possible data gaps or where missing data might exist.

information inconsistency

occurs when the same data element has different values. ex. lady got married and change last name. Now 2 different last names

analysis paralysis

occurs when the user goes into an emotional state of overanalysis (or overthinking) a situation so that a decision or action is never taken, in effect paralyzing the outcome. In the time of big data, analysis paralysis is a growing problem.

Data Mining Modeling Techniques for Predictions

optimization model - A statistical process that finds the way to make a design, system, or decision as effective as possible forecasting model - (Time-series information: Time-stamped information collected at a particular frequency.) regression model - A statistical process for estimating the relationships among variables.

physical and logical view of information

physical view of information - deals with the physical storage of information on a storage device logical view of information - focuses on how individual users logically access information to meet their own particular business needs.

infographics

present the results of data analysis, displaying the patterns, relationships, and trends in a graphical format.

extraction, transformation, and loading (ETL)

process that extracts information from internal and external databases, transforms it using a common set of enterprise definitions, and loads it into a data warehouse

distributed computing

processes and manages algorithms across many machines in a computing environment

real time systems

provide real-time information in response to requests. Many organizations use real-time systems to uncover key corporate transactional information. The growing demand for real-time information stems from organizations' need to make faster and more effective decisions, keep smaller inventories, operate more efficiently, and track performance more carefully.

metadata

provides details about data. For example, metadata for an image could include its size, resolution, and date created. Metadata about a text document could contain document length, data created, author's name, and summary.

Two primary tools are available for retrieving information from a DBMS...

query-by-example (QBE) tool - helps users graphically design the answer to a question against a database. structured query language (SQL) - asks users to write lines of code to answer questions against a database. Managers typically interact with QBE tools, and MIS professionals have the skills required to code SQL

information granularity

refers to the extent of detail within the information (fine and detailed or coarse and abstract

data governance

refers to the overall management of the availability, usability, integrity, and security of company data

relational database model and relational database management system

relational database model - stores information in the form of logically related two-dimensional tables. relational database management system - allows users to create, read, update, and delete data in a relational database

static and dynamic information

static information - includes fixed data incapable of change in the event of a user action dynamic information - includes data that change based on user actions. For example, static websites supply only information that will not change until the content editor changes the information. Dynamic information changes when a user requests information

fast data

the application of big data analytics to smaller data sets in near-real or real-time in order to solve a problem or create business value. The term fast data is often associated with business intelligence, and the goal is to quickly gather and mine structured and unstructured data so that action can be taken.

data aggregation

the collection of data from various sources for the purpose of data processing

information cube

the common term for the representation of multidimensional information. displays a cube (Cube a) that represents store information (the layers), product information (the rows), and promotion information (the columns).

virtualization

the creation of a virtual (rather than actual) version of computing resources, such as an operating system, a server, a storage device, or network resources

data mining

the process of analyzing data to extract information not offered by the raw data alone. Data mining can also begin at a summary information level (coarse granularity) and progress through increasing levels of detail (drilling down) or the reverse (drilling up) 3 STEPS Data: Foundation for data-directed decision making. Discovery: Process of identifying new patterns, trends, and insights. Deployment: Process of implementing discoveries to drive success

data profiling

the process of collecting statistics and information about data in an existing source. Insights extracted from data profiling can determine how easy or difficult it will be to use existing data for other purposes along with providing metrics on data quality

data element or (data field)

the smallest or basic unit of information. Data elements can include a customer's name, address, email, discount rate, preferred shipping method, product name, quantity ordered, and so on.

data quality audits

to determine the accuracy and completeness of its data

business intelligence dashboards

track corporate metrics such as critical success factors and key performance indicators and include advanced capabilities such as interactive controls, allowing users to manipulate data for analysis.

data mining tools

use a variety of techniques to find patterns and relationships in large volumes of information that predict future behavior and guide decision making. Data mining uncovers trends and patterns, which analysts use to build models that, when exposed to new information sets, perform a variety of information analysis functions. Data mining tools for data warehouses help users uncover business intelligence in their data.


Conjuntos de estudio relacionados

ISM4011-Lesson 13- Creating Innovative Organizations

View Set