Management Information Systems - Chapter 6
Bit
A bit represents the smallest unit of data a computer can handle.
Query
A query is a request for data from a database.
Associations
Associations are occurrences linked to a single event.
Attributes
Each entity has specific characteristics called attributes.
Database Server
In a client/server environment, the DBMS resides on a dedicated computer called a database server.
Online Analytical Processing (OLAP)
OLAP supports multidimensional data analysis, enabling users to view the same data in different ways using multiple dimensions. Each aspect of information—product, pricing, cost, region, or time period—represents a different dimension.
Data Dictionary
A data dictionary is an automated or manual file that stores definitions of data elements and their characteristics.
Data Mart
A data mart is a subset of a data warehouse, in which a summarized or highly focused portion of the organization's data is placed in a separate database for a specific population of users.
Database Management System (DBMS)
A database management system (DBMS) is a specific type of software for creating, storing, organizing, and accessing data from a database. The DBMS relieves the end user or programmer from the task of understanding where and how the data are actually stored by separating the logical and physical views of the data. The logical view presents data as end users or business specialists would perceive them, whereas the physical view shows how data are actually organized and structured on physical storage media, such as a hard disk.
Key Field
A field that uniquely identifies each record so that the record can be retrieved, updated, or sorted.
Byte
A group of bits, called a byte, represents a single character, which can be a letter, a number, or another symbol.
File
A group of records of the same type is called a file.
Record
A group of related fields, such as a student's identification number (ID), the course taken, the date, and the grade, comprises a record.
Database
A group of related files makes up a database. Databases are at the heart of all information systems because they keep track of the people, places, and things that a business must deal with on a continuing, often instant basis.
Field
A grouping of characters into a word, a group of words, or a complete number (such as a person's name or age) is called a field.
Database Administration
A large organization will also have a database design and management group within the corporate information systems division that is responsible for defining and organizing the structure and content of the database and maintaining it. In close cooperation with users, the design group establishes the physical database, the logical relations among elements, and the access rules and security procedures. The functions it performs are called database administration.
Entity-Relationship Diagram
A schematic called an entity-relationship diagram clarifies table relationships in a relational database. The most important piece of information an entity-relationship diagram provides is the manner in which two tables are related to each other. Tables in a relational database may have one-to-one, one-to-many, and many-to-many relationships.
Forecasting
Although these applications involve predictions, forecasting uses predictions in a different way. It uses a series of existing values to forecast what other values will be.
Information Policy
An information policy specifies the organization's rules for sharing, disseminating, acquiring, standardizing, classifying, and inventorying information. Information policies identify which users and organizational units can share information, where information can be distributed, and who is responsible for updating and maintaining the information.
In-Memory Computing
Another way of facilitating big data analysis is to use in-memory computing, which relies primarily on a computer's main memory (RAM) for data storage. (Conventional DBMS use disk storage systems.) Users access data stored in system primary memory, thereby eliminating bottlenecks from retrieving and reading data in a traditional, disk-based database and dramatically shortening query response times. In-memory processing makes it possible for very large sets of data, amounting to the size of a data mart or small data warehouse, to reside entirely in memory. Complex business calculations that used to take hours or days are able to be completed within seconds, and this can even be accomplished on handheld devices.
Data Quality Audit
Before a new database is in place, organizations need to identify and correct their faulty data and establish better routines for editing data once their database is in operation. Analysis of data quality often begins with a data quality audit, which is a structured survey of the accuracy and level of completeness of the data in an information system. Data quality audits can be performed by surveying entire data files, surveying samples from data files, or surveying end users for their perceptions of data quality.
Classification
Classification recognizes patterns that describe the group to which an item belongs by examining existing items that have been classified and by inferring a set of rules.
Clustering
Clustering works in a manner similar to classification when no groups have yet been defined. A data mining tool can discover different groupings within data.
Analytic Platforms
Commercial database vendors have developed specialized high-speed analytic platforms using both relational and nonrelational technology that is optimized for analyzing large data sets. These analytic platforms feature preconfigured hardware-software systems that are specifically designed for query processing and analytics.
Data Definition
DBMS have a data definition capability to specify the structure of the content of the database. It would be used to create database tables and to define the characteristics of the fields in each table. This information about the database would be documented in a data dictionary.
Report Generator
DBMS typically include capabilities for report generation so that the data of interest can be displayed in a more structured and polished format than would be possible just by querying.
Data Administration
Data administration is responsible for the specific policies and procedures through which data can be managed as an organizational resource. These responsibilities include developing information policy, planning for data, overseeing logical database design and data dictionary development, and monitoring how information systems specialists and end-user groups use data.
Data Cleansing
Data cleansing, also known as data scrubbing, consists of activities for detecting and correcting data in a database that are incorrect, incomplete, improperly formatted, or redundant. Data cleansing not only corrects data but also enforces consistency among different sets of data that originated in separate information systems. Specialized data-cleansing software is available to survey data files automatically, correct errors in the data, and integrate the data in a consistent, company-wide format.
Data Mining
Data mining provides insights into corporate data that cannot be obtained with OLAP by finding hidden patterns and relationships in large databases and inferring rules from them to predict future behavior. The patterns and rules are used to guide decision making and forecast the effect of those decisions. The types of information obtainable from data mining include associations, sequences, classifications, clusters, and forecasts.
Primary Key
Each table in a relational database has one field designated as its primary key. This key field is the unique identifier for all the information in any row of the table, and this primary key cannot be duplicated.
Hadoop
For handling unstructured and semistructured data in vast quantities, as well as structured data, organizations are starting to use Hadoop. Hadoop is an open-source software framework the Apache Software Foundation manages that enables distributed parallel processing of huge amounts of data across inexpensive computers. It breaks a big data problem down into subproblems, distributes them among up to thousands of inexpensive computer processing nodes, and then combines the result into a smaller data set that is easier to analyze.
Sequences
In sequences, events are linked over time.
Data Manipulation Language
Most DBMS have a specialized language called a data manipulation language that is used to add, change, delete, and retrieve the data in the database. This language contains commands that permit end users and programming specialists to extract data from the database to satisfy information requests and develop applications.
Nonrelational Database Management Systems
Nonrelational database management systems use a more flexible data model and are designed for managing large data sets across many distributed machines and for easily scaling up or down. They are useful for accelerating simple queries against large volumes of structured and unstructured data, including web, social media, graphics, and other forms of data that are difficult to analyze with traditional SQL-based tools.
Referential Integrity
Relational database systems enforce referential integrity rules to ensure that relationships between coupled tables remain consistent. When one table has a foreign key that points to another table, you may not add a record to the table with the foreign key unless there is a corresponding record in the linked table.
Tuples
Rows are commonly referred to as records, or, in very technical terms, as tuples.
Sentiment Analysis
Sentiment analysis software can mine text comments in an email message, blog, social media conversation, or survey form to detect favorable and unfavorable opinions about specific subjects.
Structured Query Language (SQL)
The most prominent data manipulation language today is Structured Query Language, or SQL.
Normalization
The process of streamlining complex groups of data to minimize redundant data elements and awkward many-to-many relationships and increase stability and flexibility is called normalization. A properly designed and normalized database is easy to maintain and minimizes duplicate data.
Relational Database
The relational database is the most common type of database today. Relational databases organize data into two-dimensional tables (called relations) with columns and rows. Each table contains data about an entity and its attributes. For the most part, there is one table for each business entity, so, at the most basic level, you will have one table for customers and a table each for suppliers, parts in inventory, employees, and sales transactions.
Data Warehouse
The traditional tool for analyzing corporate data for the past two decades has been the data warehouse. A data warehouse is a database that stores current and historical data of potential interest to decision makers throughout the company. The data originate in many core operational transaction systems, such as systems for sales, customer accounts, and manufacturing, and can include data from website transactions. The data warehouse extracts current and historical data from multiple operational systems inside the organization. These data are combined with data from external sources and transformed by correcting inaccurate and incomplete data and restructuring the data for management reporting and analysis before being loaded into the data warehouse. The data warehouse makes the data available for anyone to access as needed, but it cannot be altered. A data warehouse system also provides a range of ad-hoc and standardized query tools, analytical tools, and graphical reporting facilities.
Web Mining
The web is another rich source of unstructured big data for revealing patterns, trends, and insights into customer behavior. The discovery and analysis of useful patterns and information from the World Wide Web is called web mining.Web mining looks for patterns in data through content mining, structure mining, and usage mining. Web content mining is the process of extracting knowledge from the content of web pages, which may include text, image, audio, and video data. Web structure mining examines data related to the structure of a particular website. For example, links pointing to a document indicate the popularity of the document; links coming out of a document indicate the richness or perhaps the variety of topics covered in the document. Web usage mining examines user interaction data a web server records whenever requests for a website's resources are received. The usage data records the user's behavior when the user browses or makes transactions on the website and collects the data in a server log. Analyzing such data can help companies determine the value of particular customers, cross marketing strategies across products, and the effectiveness of promotional campaigns.
Big Data
There has been an explosion of data from web traffic, email messages, and social media content (tweets, status messages) as well as machine-generated data from sensors. These data may be unstructured or semistructured and thus not suitable for relational database products that organize data in the form of columns and rows. We now use the term big data to describe these data sets with volumes so huge that they are beyond the ability of typical DBMS to capture, store, and analyze. Big data doesn't designate any specific quantity but usually refers to data in the petabyte and exabyte range—in other words, billions to trillions of records, respectively, from different sources. Big data are produced in much larger quantities and much more rapidly than traditional data.
Text Mining
These tools can extract key elements from unstructured big data sets, discover patterns and relationships, and summarize the information.
Entity
To run a business, you most likely will be using data about categories of information such as customers, suppliers, employees, orders, products, shippers, and perhaps parts. Each of these generalized categories representing a person, place, or thing on which we store information is called an entity.