chapter 6, MIS

Ace your homework & exams now with Quizwiz!

Data cleansing

*) Activities, software for detecting and correcting data in a database or file that are incorrect, incomplete, improperly formatted, or redundant. Also known as data scrubbing. *) Enforces consistency among different sets of data from separate information systems.

Select

*) Create a subset of rows that meet certain criteria.

Multidimensional Data Model

*) Cubes are aggregated set data's that helps into rapid analysis. *) OLAP cube can be understand that multidimensional array of datasets. *) For example a company might wish to analyze some financial data by product, by time-period, by city, by type of revenue and cost, and by comparing actual data with a budget. *) The OLAP cube consists of numeric facts called measures which are categorized by dimensions. *) Measures are derived from the records in the fact table and dimensions are derived from the dimension tables.

How a DBMS Solves the problems of the traditional file environment

*) DBMS reduces data redundancy and inconsistency by minimizing isolated files in which the same data are repeated. *) DBMS may not able to eliminate data redundancy but at least it can help control redundancy. *) DBMS can ensure every data has the same value so it can eliminate data inconsistency. *) DBMS uncouples programs and data, enabling data to stand on their own DBMS can increase access and availability of information and reduce the program development and maintenance costs. *) Because users and programmers can perform ad hoc queries of data in the database *) DBMS enables to centrally manage data, their use, and security.

The database approach to data management

*) Database technology solves many of the problems of traditional file organization. *) Database is a collection of data organized to serve many applications efficiently by centralizing the data and controlling redundant data. *) Data store in one location. *) Single database services multiple applications. E.g. create a single common human resources database.

Primary Key

*) Each table in a relational database has one field that is designated as it is a primary key.

Design process identifies

*) Identifies relationships among data elements *) Identifies the efficient way of grouping data elements together. *) Identifies redundant data elements. This is to meet business information requirements and the need of application programs.

Business intelligence

*) Information collected from multiple sources such as suppliers, customers, competitors, partners, and industries that analyzes patterns, trends, and relationships for strategic decision making *) Enhanced reporting system outputs and reports for planning or decision making

Database management systems (DBMS)

*) Is software that permits an organization to centralize data, manage them efficiently and provide access to the stored data by application programs. *) DBMS acts as an interface between application programs and the physical data files.

Foreign Key

*) Primary key used in second table as look-up field to identify records from original table.

Business intelligence - Principle tools include

*) Software for database query and reporting *) Online analytical processing (OLAP) *) Data mining

Object-Oriented DBMS (OODBMS)

*) Store data and procedures as objects. *) Object can be graphics, multimedia, Java applets. *) OODBMS slow compared with relational DBMS for processing large numbers of transactions

Data warehouse

*) Stores current and historical data form many operational transaction systems. *) Data warehousing provide query, analysis and reporting tools. *) Consolidates and standardizes information for use across enterprise, but data cannot be altered.

Online analytical processing (OLAP)

*) Supports multidimensional data analysis - Viewing data using multiple dimensions - Each aspect of information (product, pricing, cost, region, time period) is different dimension - E.g., how many washers sold in the East in June compared with other regions? *) OLAP enables rapid, online answers to ad hoc queries

Designing Databases

*) The database requires both a conceptual design and a physical design.

Key field

*) This key field is the unique identifier for all the information in any row of the table and this primary key cannot be duplicated.

Operations of a relational DBMS

*) Three basic operations used to develop useful sets of data.

Very large databases and systems require special capabilities, tools

*) To analyze large quantities of data *) To access data from multiple systems

Databases in the Cloud

*) Typically less functionality than on-premises DBs *) Amazon Web Services, Microsoft SQL Azure *) You only pay for the exact services you need.

A data dictionary

*) is an automated or manual file that stores definitions of data elements and their characteristics. *) Data dictionaries for large corporate databases may capture additional information, such as usage, ownership (who in the organization is responsible for maintaining the data), authorization; security, and the individuals, business functions, programs, and reports that use each data element.

Data manipulation language

*) is used to add, change, delete, and retrieve the data in the database.

Microsoft Access and other DBMS include capabilities for

*) report generation so that the data of interest can be displayed in a more structured and polished format than would be possible just by querying. *) Access also has capabilities for developing desktop system applications. *) These include tools for creating data entry screens, reports, and developing the logic for processing transactions.

Types of information obtainable from data mining

- Associations - Sequences - Classification - Clustering - Forecasting

Three Key techniques of analyzing large database

- Data warehousing - Data mining - Tools for accessing internal databases through the Web

Before new database in place, need to:

- Identify and correct faulty data - Establish better routines for editing data once database in operation

Data Marts

- Subset of data warehouse - Summarized or highly focused portion of firm's data for use by specific users - Typically focuses on single subject or line of business.

Capabilities of Database Management System

A DBMS includes capabilities and tools for organizing, managing, and accessing the data in the database. The most important are its data definition language, data dictionary, and data manipulation language.

Entity relationship diagram

A diagram that shows how different entities database are connected.

Field

A grouping of characters into a word, a group of words or complete number (person name or age Supplier name, street, city, and zipcode) and related fields can be grouped to form a record.

Data quality audit

A survey and/or sample of files to determine accuracy and completeness of data in an information system - Survey samples from data files, or - Survey end users for perceptions of quality

After normalization, the original relation ORDER has been broken down into four smaller relations. The relation ORDER is left with only two attributes and the relation LINE_ITEM has a combined, or concatenated, key consisting of Order_Number and Part_Number.

An unnormalized relation contains repeating groups. For example, there can be many parts and suppliers for each order. There is only a one-to-one correspondence between Order_Number and Order_Date

Web mining

Analyzes unstructured data (useful patterns and information) associated with websites to identify consumer behavior and website navigation from WWW

Text mining

Analyzes unstructured data to find trends and patterns in words and sentences.

Predictive analysis

Combines historical data with assumptions about future conditions to predict outcomes of events, such as future product sales or the probability that a customer will respond to an offer

Database administration

Defining, organizing, implementing, maintaining database; performed by database design and management group

Sequences

Events linked over time

Establishing an information policy

Firm's rules, procedures, roles for sharing, managing, standardizing data

An order can include more than one part, and it may be that several parts are supplied by the same supplier.

However, using this table, the supplier's name, and other information might need to be stored several times on the order list.

Web content mining

Knowledge extracted from content web pages

Ensuring data quality

More than 25% of critical data in Fortune 1000 company databases are inaccurate or incomplete

Associations

Occurrences linked to single event

6.1 Organization data in a traditional file environment

Poorly organized and maintained IS may result in business don't have timely, accurate, relevant information. So data management is so essential.

Classification

Recognizes patterns that describe group to which item belongs

Data governance

Refers policies and processes for managing availability, usability, integrity, and security of company data especially as it relates to government regulations.

Relational DBMS

Represent data as a two-dimensional table (called relations or files ) each table contains data on an entity and attributes.

Byte

Represents a single character (letter, number or symbol). Bytes can be grouped to form a field.

Fields (columns):

Represents attribute for entity

Bit

Represents the smallest unit of data either a 0 or a 1. Bits can be grouped to form a byte.

Clustering

Similar to classification when no groups have been defined; finds groupings within data

Data definition capability

Specifies structure of database content, used to create tables and define characteristics of fields.

Normalization

The process of creating small, stable, yet flexible and adaptive data structures from complex groups of data.

Web usage mining

User interaction data recorded by web server

Forecasting

Uses series of existing values to forecast what other values will be

File

a group of records of the same type (a group of record) and related files can be organized into a database.

Record

a group of related fields (student's name, a course taken, the date, the grade) related records can be collected to form a file.

Database

a group of related files makes up a database. Ex.) Telephone Book

The conceptual, or logical, design of a database is

an abstract model of the database from a business perspective

Hybrid object-relational DBMS

are now available to provide capabilities of both object-oriented and relational DBMS.

The data hierarchy starts with

bits, bytes, progresses to fields, records, files and databases

Replicated

central database duplicated in entirety at different locations.

Partitioned

different parts of databases store in separate locations

Most data quality problems stem from

faulty input

A computer system organizes data in

hierarchy

The most prominent data manipulation language today

is Structured Query Language, or SQL.

Entity

is a person, place, thing, or event on which we store and maintain information.

Crystal Reports

is a popular report generator for large corporate DBMS, although it can also be used with Access.

Attribute

is describing a particular entity such as student_ID, Course, Date, Grade

Data Mining (Market basket analysis)

is the process of discovering unknown patterns and relationships in large amounts of data to predict future behavior. *) E.g., Finding patterns in customer data for one-to-one marketing campaigns or to identify profitable customers *) - Predictive analysis - Text mining - Web mining (Link mining)

Web structure mining

links to and from web page

Join

operation combines relational tables to provide the user with more information that is available in individual tables.

Project

operation creates a subset consisting of columns in a table, permitting the user to create new tables that contain only the information required.

Rows (tuples):

referred to as records, records for different entities.

Data administration

responsible for specific policies and procedures to manage data

Physical design

shows how the database is actually arranged on direct-access storage devices.

Distributing databases

storing database in more than one place


Related study sets

Chapter 17: High-Risk Neonatal Nursing Care by D&C

View Set

Java - networking Socket Class Methods

View Set

Neuropsychology EPPP Test Questions

View Set

Insurance, Buying a Car and a Home

View Set

TCI 28 MC Questions - India's Comparative Advantage

View Set

Chapter 4: The General Journal and the General Ledger, Quiz

View Set

Business Chapter 9: Motivating Employees

View Set