ITM Chapter 6

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

Attributes

(also called columns or fields) are the data elements associated with an entity.

6 phases in data mining

1. Business understanding 2. Data understanding 3. Data preparation 4. Data modeling 5. Evaluation 6. Deployment

Data artist

A business analytics specialist who uses visual tools to help people understand complex data.

Record

A collection of related data elements.

Data warehouse

A logical collection of information, gathered from many operational databases, that supports business analysis activities and decision-making tasks.

Extraction, transformation, and loading (ETL)

A process that extracts information from internal and external databases, transforms it using a common set of enterprise definitions, and loads it into a data warehouse.

Regression model

A statistical process for estimating the relationships among variables. Include many techniques for modeling and analyzing several variables when the focus is on the relationship between a dependent variable and one or more independent variables.

Optimization model

A statistical process that finds the way to make a design, system, or decision as effective as possible; for example, finding the values of controllable variables that determine maximal productivity or minimal waste.

Data lake

A storage repository that holds a vast amount of raw data in its original format until the business needs it. While a traditional data warehouse stores data in files or folders, a ______ uses a flat architecture to store data

Cluster Analysis

A technique used to divide an information set into mutually exclusive groups such that the members of each group are as close together as possible to one another and the different groups are as far apart as possible

Relational database management system

Allows users to create, read, update, and delete data in a relational database. Although the hierarchical and network models are important, this text focuses only on the relational database model.

Dynamic catalog

An area of a website that stores information about products in a database.

Integrity constraints

Are rules that help ensure the quality of information. The database design needs to consider these

Structured query language (SQL)

Asks users to write lines of code to answer questions against a database. Managers typically interact with QBE tools, and MIS professionals have the skills required to code SQL.

Data dictionary

Compiles all of the metadata about the data elements in the data model. Looking at a data model along with reviewing these provides tremendous insight into the database's functions, purpose, and business rules.

Database management system (DBMS)

Creates, reads, updates, and deletes data in a database while controlling access and security. Managers send requests to the DBMS, and the DBMS performs the actual manipulation of the data in the database.

Data visualization

Describes technologies that allow users to "see" or visualize data to transform information into a business perspective.

Data quality audits

Determine the accuracy and completeness of its data. Most organizations determine a percentage of accuracy and completeness high enough to make good decisions at a reasonable cost, such as 85 percent accurate and 65 percent complete.

Estimation Analysis

Determines values for an unknown continuous behavior or estimated future value

Business-critical integrity constraints

Enforces business rules vital to an organization's success and often requires more insight and knowledge than relational integrity constraints.

Market basket analysis

Evaluates such items as websites and checkout scanner information to detect customers' buying behavior and predict future behavior by identifying affinities among customers' choices of products and services (see Figure 6.30). Is frequently used to develop marketing campaigns for cross-selling products and services (especially in banking, insurance, and finance) and for inventory control, shelf-product placement, and other retail and marketing applications.

Query-by-example (QBE) tool

Helps users graphically design the answer to a question against a database.

Real-time information

Immediate, up-to-date information.

Data validation

Includes the tests and evaluations used to determine compliance with data governance polices to ensure correctness of data. Helps to ensure that every data value is correct and accurate

Primary key

Is a field (or group of fields) that uniquely identifies a given record in a table. In the table RECORDINGS, the primary key is the field RecordingID that uniquely identifies each record in the table.

A foreign key

Is a primary key of one table that appears as an attribute in another table and acts to provide a logical relationship between the two tables.

Information cleansing or scrubbing

Is a process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information.

Dirty data

Is erroneous or flawed data

Data steward

Is responsible for ensuring the policies and procedures are implemented across the organization and acts as a liaison between the MIS department and the business.

Data stewardship

Is the management and oversight of an organization's data assets to help provide business users with high-quality data that is easily accessible in a consistent manner.

Data replication

Is the process of sharing information to ensure consistency between multiple data sources.

Analytics

Is the science of fact-based decision making. Uses software-based algorithms and statistics to derive meaning from data. Advanced _____ uses data patterns to make forward-looking predictions to explain to the organization where it is headed.

Information integrity issues

Occurs when a system produces incorrect, inconsistent, or duplicate data. Data integrity issues can cause managers to consider the system reports invalid and make decisions based on other sources.

Information inconsistency

Occurs when the same data element has different values.

Analysis paralysis

Occurs when the user goes into an emotional state of overanalysis (or overthinking) a situation so that a decision or action is never taken, in effect paralyzing the outcome. In the time of big data, _____ is a growing problem.

Real-time systems

Provide real-time information in response to requests. Many organizations use these to uncover key corporate transactional information. The growing demand for them from organizations' need to make faster and more effective decisions, keep smaller inventories, operate more efficiently, and track performance more carefully.

Metadata

Provides details about data. For example, metadata for an image could include its size, resolution, and date created. _______ about a text document could contain document length, data created, author's name, and summary

Data governance

Refers to the overall management of the availability, usability, integrity, and security of company data.

Affinity Grouping Analysis

Reveals the relationship between variables along with the nature and frequency of the relationships

Relational database model

Stores information in the form of logically related two-dimensional tables.

Fast data

The application of big data analytics to smaller data sets in near-real or real-time in order to solve a problem or create business value. The term fast data is often associated with business intelligence, and the goal is to quickly gather and mine structured and unstructured data so that action can be taken

Master data management (MDM)

The practice of gathering data and ensuring that it is uniform, accurate, consistent, and complete, including such entities as customers, suppliers, products, sales, employees, and other critical entities that are commonly integrated across organizational systems.

Anomaly detection

The process of identifying rare or unexpected items or events in a data set that do not conform to other items in the data set. One of the key advantages of performing advanced analytics is to detect anomalies in the data to ensure they are not used in models creating false results.

Classification Analysis

The process of organizing data into categories or groups for its most effective and efficient use

Forecasting model

Time-series information is time-stamped information collected at a particular frequency. Forecasts are predictions based on time-series information, allowing users to manipulate the time series for forecasting activities.

Business intelligence dashboards

Track corporate metrics such as critical success factors and key performance indicators and include advanced capabilities such as interactive controls, allowing users to manipulate data for analysis.

Four primary traits that help determine the value of information

Type, Timeliness, Quality, Governance

Four Common Characteristics of Big Data

Variety, veracity, volume, velocity

Data models

are logical data structures that detail the relationships among data elements by using graphics or pictures.

Algorithms

are mathematical formulas placed in software that performs an analysis on a data set

Relational integrity constraints

are rules that enforce basic and fundamental information-based constraints.

Comparative analysis

can compare two or more data sets to identify patterns and trends. Employees can base their decisions on data sets, experience, or knowledge and, preferably a combination of all three.

Data mart

contains a subset of data warehouse information. To distinguish between data warehouses and data marts, think of data warehouses as having a more organizational focus and data marts as having a functional focus

Physical view of information

deals with the physical storage of information on a storage device

Business rule

defines how a company performs certain aspects of its business and typically results in either a yes/no or true/false answer. Stating that merchandise returns are allowed within 10 days of purchase is an example of one of these

Logical view of information

focuses on how individual users logically access information to meet their own particular business needs.

Source data

identifies the primary location where data is collected. Can include invoices, spreadsheets, time sheets, transactions, and electronic sources such as other databases.

Dynamic information

includes data that change based on user actions. For example, static websites supply only information that will not change until the content editor changes the information. Changes when a user requests information

Static information

includes fixed data incapable of change in the event of a user action.

Data broker

is a business that collects personal information about consumers and sells that information to other organizations.

Repository

is a central location in which data is stored and managed.

Recommendation engine

is a data mining algorithm that analyzes a customer's purchases and actions on a website and then uses the data to recommend complementary products.

Outlier

is a data value that is numerically distant from most of the other data points in a set of data. Anomaly detection helps to identify _____ in the data that can cause problems with mathematical modeling.

Information integrity

is a measure of the quality of information.

Prediction

is a statement about what will happen or might happen in the future; for example, predicting future sales or employee turnover.

Data map

is a technique for establishing a match, or balance, between the source data and the target data warehouse. This technique identifies data shortfalls and recognizes data issues. Can also alert managers to inconsistencies or help determine the cause and effects of enterprise-wide business decisions.

Data-driven decision management

is an approach to business governance that values decisions that can be backed up with verifiable data. The success of the data-driven approach is reliant upon the quality of the data gathered and the effectiveness of its analysis and interpretation.

Data point

is an individual item on a graph or a chart.

Data set

is an organized collection of data.

Virtualization

is the creation of a virtual (rather than actual) version of computing resources, such as an operating system, a server, a storage device, or network resource

Information redundancy

is the duplication of data, or the storage of the same data in multiple places.

Content creator

is the person responsible for creating the original website content.

Content editor

is the person responsible for updating and maintaining website content

Data mining

is the process of analyzing data to extract information not offered by the raw data alone.

A data element (or data field)

is the smallest or basic unit of information. Can include a customer's name, address, email, discount rate, preferred shipping method, product name, quantity ordered, and so on.

Database

maintains information about various types of objects (inventory), events (transactions), people (employees), and places (warehouses).

Data visualization tools

move beyond Excel graphs and charts into sophisticated analysis techniques such as controls, instruments, maps, time-series graphs, and more.

Infographics (information graphics)

present the results of data analysis, displaying the patterns, relationships, and trends in a graphical format. Are exciting and quickly convey a story users can understand without having to analyze numbers, tables, and boring charts

Information granularity

refers to the extent of detail within the information (fine and detailed or coarse and abstract).

Data mining tools

use a variety of techniques to find patterns and relationships in large volumes of information that predict future behavior and guide decision making. Uncovers trends and patterns, which analysts use to build models that, when exposed to new information sets, perform a variety of information analysis functions. ________ tools for data warehouses help users uncover business intelligence in their data.


संबंधित स्टडी सेट्स

Session 6 Quizlet - Perioperative nursing

View Set

A2 Sociology - Unit 4 - Topic 2 - Cyber Crime

View Set

Haitian and Latin American Revolutions Unit 4

View Set