chapter 6 MIS
entity examples
Examples: Customers, employees, parts, suppliers
in memory computing is enabled
High-speed processors, multicore processing, falling computer memory prices
Data collected by OLTP (on-line transaction processing)
Lacks tools for fast retrieval, data analysis, and does not contain historical data
Hadoop
Open-source software framework from Apache Designed for big data Breaks data task into sub-problems and distributes the processing to many inexpensive computer processing nodes Combines result into smaller data set that is easier to analyze Can process large quantities of any kind of data (structured transactional data, complex data, loosely structured data, unstructured audio and video data)
Data quality problems can be caused by
Redundant & inconsistent data produced by multiple systems Data input errors
Which of the following is not a step a firm might take to make sure they have a high level of data quality?
Using in-memory computing
Foreign key
a "look-up" field in a database that allows users to find related information in another database table
Data definition
a DBMS capability that specifies the structure and content of the database.
Entity
a general category representing a person, place, thing about which we store and maintain information; "subject" of the data
Field
a group of related bytes or characters representing an attribute (characteristic) about an entity (subject of the data); an individual element of data
Database
a group of related files about a specific entity (subject); ex. HR database
File
a group of related records about an entity (subject of the data); ex. Employee Benefits file, Employee Payroll file, Employee Job History file
An entity-relationship diagram
a methodology for documenting a database illustrating the relationship between various entities in the database.
In a database, each row represents __________.
a record
The select, project, and join operations
enable data from two different tables to be combined and only selected attributes to be displayed to create a report
Sequences:
events linked over time
Hadoop
is used to process large quantities of structured and transnational data
Database administration
responsible for defining and organizing the structure and content of the database, and maintaining the database.
Traditional programming
separates the data from the program code (the operations or actions that act on them)
Which of the following best describes a data manipulation language?
A data manipulation language is associated with a database management system that end users and programmers use to manipulate data in the database.
Which of the following best describes a data quality audit?
A data quality audit is a structured survey of the accuracy and level of completeness of the data in an information system
Characteristics of high quality information include:
Accurate Complete Relevant Timely
Business Intelligence Infrastructure
Array of tools for obtaining useful information from internal and external systems and big data
Which of the following functions of an organization is responsible for information policy, as well as for data planning, data dictionary development, and monitoring data usage in the firm?
Data administration
Which of the following can be used to automatically enforce consistency among different sets of data?
Data cleansing
Big Data
Datasets with volumes so large they are beyond the ability of typical relational DBMS to capture, store and analyze Are often unstructured or semi-structured data from Internet and networked services and applications Billions or trillions of records that accumulate more rapidly than traditional data Provide more patterns and insights than smaller datasets
data driven website characteristics
Improves access to and updates to information Useful for e-commerce sites, news sites, forums & discussion groups, subscription services
a key field
In a database, __________ is used to uniquely identify each record for retrieval or manipulation.
Sentiment analysis
Mines online text comments or in e-mail to measure customer sentiment
Analytical Tools for business intelligence
OLAP, Data Mining, Text Mining, Web Mining
Relational database model:
Organize data into two-dimensional tables (relations) with columns and rows One table for each entity: E.g., (CUSTOMER, SUPPLIER, PART, SALES)
Analytic Platforms
Preconfigured hardware-software systems like IBM's Netezza, Oracle Big Data Appliance Designed for query processing and analytics for very large datasets Use both relational and non-relational technology to analyze large data sets Also Include In-Memory systems, NoSQL DBMS
Normalization
Process of streamlining complex groups of data to: Minimize redundant data elements Minimize awkward many-to-many relationships Increase stability and flexibility
Cloud Databases
Relational database engines provided by cloud computing services, such as Amazon Pricing based on usage Reduced investment in HW, SW Appeal to Web-focused businesses, small or medium-sized businesses seeking lower costs than developing and hosting in-house databases
Referential integrity rules
Rules used by relational databases to ensure that relationships between coupled tables remain consistent
Data Mart
Smaller version of a data warehouse - contains a subset of data, usually for a single aspect of a firm's business Used by a single department or function More limited scope than a data warehouse and customized to support decision making for a particular end-user group
Database Management System (DBMS)
Software for creating, storing, organizing, and accessing data from a database separates the logical and physical views of the data
data definition capabilities:
Specify structure of content of database Used to create the database tables and to define the characteristics of the fields in each table
Information policy
States organization's rules for organizing, managing, storing, sharing information
Data dictionary
Stores definitions of data elements and their characteristics Name of data item SSN Description Social Security Number xxx Size 11 bytes Type alpha-numeric Format xxx-xx-xxxx Default value/range of allowable values ex: $10-12/hr for a specific job classification
data warehouse characteristics
Supports business analysis activities & decision-making tasks Stores 3 - 10 year's of historical data Supports managerial decision making (analysis) Can be accessed but not altered Is regularly updated and cleansed Requires heavy-duty processing power and storage capacity Uses queries to analyze past data in order to spot trends, patterns
Online Analytical Processing (OLAP)
Supports multidimensional data analysis, enabling users to view the same data in different ways using multiple dimensions Enables users to obtain online answers to ad hoc questions such as these in a fairly rapid amount of time
database administration
The __________ functions of an organization are responsible for defining and maintaining a database. These functions are performed by a database design and management group
Which of the following statements about the power of a relational DBMS is false?
The relational database has become the primary method for organizing and maintaining data in information systems because it is so rigidly controlled.
Record
a group of related fields; a collection of attributes (characteristics) about an entity (subject of the data) - there will be one record for every entity in the file; (there will be one record for every employee, part, etc.)
In a database, each column represents __________.
an attribute or a field
Data-driven web site
an interactive web site that serves as an interface to a database & is kept constantly updated and relevant to the needs of its customers/users
Data Mining
analyzes large pools of data to find hidden patterns and relationships in large databases, data warehouses or data marts and infers rules from them to predict future behavior and guide decision-making
hadoop
breaks a big data problem down into sub-problems, distributes them among up to thousands of inexpensive computer processing nodes, and then combines the result into a smaller data set that is easier to analyze.
Poor data quality
can create a major obstacle to successful customer relationship management as well as serious operational and financial problems
Data manipulation language
commands used to add, delete, change and retrieve data from the database
intrusion detection system
computer program that senses when another computer is attempting to scan the disk or otherwise access a computer.
Objects
data and actions that can be performed on the data (methods)
David creates a central database by extracting, transforming, and loading metadata from various internal and external sources of information. He plans to use this database for executing various functions such as intelligence gathering, strategic planning, and analysis. This central repository of information is referred to as a
data warehouse
Clustering:
discovering "as yet unclassified" groupings
Text Mining
discovery of patterns and relationships from large sets of unstructured data allows businesses to extract key elements from large unstructured data sets, discover patterns & relationships, and summarize the information
Primary key
field that uniquely identifies a given record (row) in a table
physical view:
how data are actually structured and organized; where is the data actually located? There is only 1 physical view
Attribute
specific characteristic of an entity Examples: supplier name, supplier street address, part number, zip code, employee last name
Data Warehouse
stores current and historical data of potential interest to organizational decision makers; Gathered from many different operational databases
Data quality audit
structured survey of the accuracy and completeness of data
A non-relational database management system
system for working with large quantities of structured and unstructured data that would be difficult to analyze with a relational model.
Online analytical processing (OLAP) is best defined as
the capability for manipulating and analyzing large volumes of data from multiple perspectives
Data Hierarchy
the structure and organization of data (involves bits, characters/bytes, fields, records, files, databases)
Object Oriented programming and databases
tie the data and program code together in objects and then manipulate the objects to create a program
Key field
used to identify individual records
Report generation
users can define report formats
Forecasting
uses a series of existing values to forecast future values
An information policy
would specify that only selected members of the payroll and human resources department would have the right to change sensitive employee data, such as an employee's salary and social security number, and that these departments are responsible for making sure that such employee data are accurate.
Database administration
Database design and management group responsible for defining and organizing the structure and content of the database, and maintaining the database
Non-Relational Databases
Developed to handle large sets of data that are not easily organized into tables, columns, and rows
Web Mining
Discovery and analysis of useful patterns and information from the Web
Why is a relational DBMS so powerful?
Relational database products are available as cloud computing services.
In-Memory Computing -
Relies on computer's main memory (RAM) for data storage Accesses data stored in system primary memory, which eliminates bottlenecks in retrieving and reading data from hard-disk based databases Dramatically shortens query response times Lowers processing costs by optimizing the use of memory and accelerating processing performance
Data administration
Responsible for specific policies and procedures through which data can be managed as a resource
NoSQL"databases: Non-relational DBMS
Use more flexible data model Don't require extensive structuring Can manage unstructured data, such as social media and graphics
Which of the following is not a step a firm might take to make sure they have a high level of data quality?
Using data mining
data dictionary
automated or manual tool for storing and organizing information about the data maintained in a database.
Data cleansing:
detects and corrects incorrect, incomplete, improperly formatted, and redundant data
Organizations perform data quality audits to __________.
determine accuracy and level of completeness of data
Logical view:
how end users view data; what data does an individual user need? There can be more than 1 logical view
The organization's rules for sharing, disseminating, acquiring, standardizing, classifying, and inventorying information are specified by
information policies
Normalization
involves the process of creating small, stable, yet flexible data structures from complex groups of data when designing a relational database.
Data mining
is a type of intelligence gathering that uses statistical techniques to explore records in a data warehouse, hunting for hidden patterns and relationships that are undetectable in routine reports.
Structure mining
mines Web site structural elements, such as links
Content mining
mines content of Web sites
Usage mining
mines user interaction data gathered by Web servers
Associations:
occurrences linked to single event
relational database
organizes its data in the form of tables and represents relationships using foreign keys.
Classifications:
patterns describing a group an item belongs to