Chapter 6 MIS
record
A group of related fields, such as a student's identification number (ID), the course taken, the date, and the grade
field
A grouping of characters into a word, a group of words, or a complete number (such as a person's name or age)
entity relationship diagram
A methodology for documenting databases illustrating the relationship between various entities in the database.
Foreign Key
A primary key of one table that appears as an attribute in another table and acts to provide a logical relationship between the two tables essentially a look-up field to find data about the supplier of a specific part. Note that the PART table would itself have its own primary key field, Part_Number, to identify each part uniquely
Which statement about big data is FALSE?
Big data can be processed with traditional techniques.
True Statements about Hadoop
Hadoop breaks a big data problem down into sub-problems Hadoop is an open source software framework. Hadoop combines results into a smaller data set that is easier to analyze. Hadoop's MapReduce was inspired by Google's system for processing huge data sets.
Which tool enables users to view the same data in different ways using multiple dimensions?
OLAP
What is the first step in effectively managing data for a firm?
Specify the information policy
web mining
The discovery and analysis of useful patterns and information from the web help them understand customer behavior, evaluate the effectiveness of a particular website, or quantify the success of a marketing campaign looks for patterns in data through content mining, structure mining, and usage mining
normalization
The process of streamlining complex groups of data to minimize redundant data elements and awkward many-to-many relationships and increase stability and flexibility
Which of the following does need to be addressed in an organization's information policy?
Who is responsible for updating and maintaining the information Procedures and accountabilities around managing data resources Which users and organizational units can share information Where information can be distributed
data warehouse
a database that stores current and historical data of potential interest to decision makers throughout the company. The data originate in many core operational transaction systems, such as systems for sales, customer accounts, and manufacturing, and may include data from website transactions
byte
a group of 8 bits; represents a single character, which can be a letter, a number, or another symbol
data lake
a repository for raw unstructured data or structured data that for the most part have not yet been analyzed, and the data can be accessed in many ways
data quality audit
a structured survey of the accuracy and level of completeness of the data in an information system can be performed by surveying entire data files, surveying samples from data files, or surveying end users for their perceptions of data quality.
data mart
a subset of a data warehouse in which a summarized or highly focused portion of the organization's data is placed in a separate database for a specific population of users
Data ________ is important because it establishes an organization's rules for sharing, disseminating, acquiring, standardizing, and classifying data.
administration
data cleansing
also known as data scrubbing, consists of activities for detecting and correcting data in a database that are incorrect, incomplete, improperly formatted, or redundant not only corrects data but also enforces consistency among different sets of data that originated in separate information systems
data dictionary
an automated or manual file that stores definitions of data elements and their characteristics.
hadoop
an open source software framework managed by the Apache Software Foundation that enables distributed parallel processing of very large amounts of data across inexpensive computers It breaks a big data problem down into subproblems, distributes them among up to thousands of inexpensive computer processing nodes, and then combines the result into a smaller data set that is easier to analyze
Commercial database vendors have developed specialized high-speed
analytical platforms
text mining
analyzes unstructured data to find trends and patterns in words and sentences
Structured Query Language (SQL)
asks users to write lines of code to answer questions against a database
The types of information obtainable from data mining include
associations, sequences, classifications, clusters, and forecasts.
A computer system organizes data in a hierarchy that starts with
bits and bytes and progresses to fields, records, files, and databases
Sentiment Analysis
can mine text comments in an email message, blog, social media conversation, or survey form to detect favorable and unfavorable opinions about specific subjects
DBMS includes
capabilities and tools for organizing, managing, and accessing the data in the database The most important are its data definition capability, data dictionary, and data manipulation language
data definition
capability to specify the structure of the content of the database used to create database tables and to define the characteristics of the fields in each table
entity
categories representing a person, place, or thing on which we store information
attributes
characteristics
join operation
combines relational tables to provide the user with more information than is available in individual tables.
select operation
creates a subset consisting of all records in the file that meet stated criteria.
project operation
creates a subset consisting of columns in a table, permitting the user to create new tables that contain only the information required
the DBMS often resides on a dedicated computer called a
database server
blockchain
distributed database technology that enables firms and organizations to create and verify transactions on a network nearly instantaneously without a central authority. The system stores transactions as a distributed ledger among a network of computers. The information held in the database is continually reconciled by the computers in the network.
data governance
encompasses policies and procedures through which data can be managed as an organizational resource
Companies often build
enterprise-wide data warehouses, where a central data warehouse serves the entire organization, or they create smaller, decentralized warehouses called data marts.
There are a number of advantages to using the web to access an organization's internal databases
everyone knows how to use web browser software, and employees require much less training than if they used proprietary query tools. the web interface requires few or no changes to the internal database
analytical platforms
feature preconfigured hardware-software systems that are specifically designed for query processing and analytics
Data ________ is important because it establishes an organization's rules for sharing, disseminating, acquiring, standardizing, and classifying data.
governance
database
group of related files
handling unstructured and semistructured data in vast quantities, as well as structured data, organizations are using
hadoop
key field
identifies each record so that the record can be retrieved, updated, or sorted
sequences
linked over time
associations
occurrences linked to a single event.
distributed database
one that is stored in multiple physical locations. Parts or copies of the database are physically stored in one location and other parts or copies are maintained in other locations.
relational databases
organize data into two-dimensional tables (called relations) with columns and rows; most common type of database; Each table contains data about an entity and its attributes
logical view
presents data as end users or business specialists would perceive them
Each table in a relational database has one field designated as its
primary key; the unique identifier for all the information in any row of the table; cannot be duplicated
Data mining
provides insights into corporate data that cannot be obtained with OLAP by finding hidden patterns and relationships in large databases and inferring rules from them to predict future behavior
classification
recognizes patterns that describe the group to which an item belongs by examining existing items that have been classified and by inferring a set of rules
bit
represents the smallest unit of data a computer can handle
Referential Integrity
rules to ensure that relationships between coupled tables remain consistent.
physical view
shows how data are actually organized and structured on physical storage media, such as a hard disk.
Database Management System (DBMS)
specific type of software for creating, storing, organizing, and accessing data from a database relieves the end user or programmer from the task of understanding where and how the data are actually stored by separating the logical and physical views of the data
OLAP (online analytical processing)
supports multidimensional data analysis, enabling users to view the same data in different ways using multiple dimensions. Each aspect of information—product, pricing, cost, region, or time period—represents a different dimension.
Hadoop consists of several key services
the Hadoop Distributed File System (HDFS) for data storage and MapReduce for high-performance parallel data processing
Big data is often characterized by the "3Vs
the extreme volume of data, the wide variety of data types and sources, and the velocity at which the data must be processed
NoSQL
use a more flexible data model and are designed for managing large data sets across many distributed machines and for easily scaling up or down. They are useful for accelerating simple queries against large volumes of structured and unstructured data, including web, social media, graphics, and other forms of data that are difficult to analyze with traditional SQL-based tools.
Data Manipulation Language (DML)
used to add, change, delete, and retrieve the data in the database contains commands that permit end users and programming specialists to extract data from the database to satisfy information requests and develop applications
forecasting
uses a series of existing values to forecast what other values will be
Cloud-based data management services have special appeal for
web-focused startups or small to medium-sized businesses seeking database capabilities at a lower cost than in-house database products.
in-memory computing
which relies primarily on a computer's main memory (RAM) for data storage. (Conventional DBMS use disk storage systems.) Users access data stored in system's primary memory, thereby eliminating bottlenecks from retrieving and reading data in a traditional, disk-based database and dramatically shortening query response times.
clustering
works in a manner similar to classification when no groups have yet been defined
true statements about big data
"Big data" data sets are at least a petabyte in size. Big data can consist of multimedia files like graphics, audio, and video. Big data has a variety of data with structured data and free-form text and logs. it is generated rapidly