Chapter 6 MIS

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

record

A group of related fields, such as a student's identification number (ID), the course taken, the date, and the grade

field

A grouping of characters into a word, a group of words, or a complete number (such as a person's name or age)

entity relationship diagram

A methodology for documenting databases illustrating the relationship between various entities in the database.

Foreign Key

A primary key of one table that appears as an attribute in another table and acts to provide a logical relationship between the two tables essentially a look-up field to find data about the supplier of a specific part. Note that the PART table would itself have its own primary key field, Part_Number, to identify each part uniquely

Which statement about big data is​ FALSE?

Big data can be processed with traditional techniques.

True Statements about Hadoop

Hadoop breaks a big data problem down into​ sub-problems Hadoop is an open source software framework. Hadoop combines results into a smaller data set that is easier to analyze. ​Hadoop's MapReduce was inspired by​ Google's system for processing huge data sets.

Which tool enables users to view the same data in different ways using multiple​ dimensions?

OLAP

What is the first step in effectively managing data for a​ firm?

Specify the information policy

web mining

The discovery and analysis of useful patterns and information from the web help them understand customer behavior, evaluate the effectiveness of a particular website, or quantify the success of a marketing campaign looks for patterns in data through content mining, structure mining, and usage mining

normalization

The process of streamlining complex groups of data to minimize redundant data elements and awkward many-to-many relationships and increase stability and flexibility

Which of the following does need to be addressed in an​ organization's information​ policy?

Who is responsible for updating and maintaining the information Procedures and accountabilities around managing data resources Which users and organizational units can share information Where information can be distributed

data warehouse

a database that stores current and historical data of potential interest to decision makers throughout the company. The data originate in many core operational transaction systems, such as systems for sales, customer accounts, and manufacturing, and may include data from website transactions

byte

a group of 8 bits; represents a single character, which can be a letter, a number, or another symbol

data lake

a repository for raw unstructured data or structured data that for the most part have not yet been analyzed, and the data can be accessed in many ways

data quality audit

a structured survey of the accuracy and level of completeness of the data in an information system can be performed by surveying entire data files, surveying samples from data files, or surveying end users for their perceptions of data quality.

data mart

a subset of a data warehouse in which a summarized or highly focused portion of the organization's data is placed in a separate database for a specific population of users

Data​ ________ is important because it establishes an​ organization's rules for​ sharing, disseminating,​ acquiring, standardizing, and classifying data.

administration

data cleansing

also known as​ data scrubbing, consists of activities for detecting and correcting data in a database ​that are incorrect, incomplete, improperly formatted, or redundant not only corrects data but also enforces consistency among different sets of data that originated in separate information systems

data dictionary

an automated or manual file that stores definitions of data elements and their characteristics.

hadoop

an open source software framework managed by the Apache Software Foundation that enables distributed parallel processing of very large amounts of data across inexpensive computers It breaks a big data problem down into subproblems, distributes them among up to thousands of inexpensive computer processing nodes, and then combines the result into a smaller data set that is easier to analyze

Commercial database vendors have developed specialized high-speed

analytical platforms

text mining

analyzes unstructured data to find trends and patterns in words and sentences

Structured Query Language (SQL)

asks users to write lines of code to answer questions against a database

The types of information obtainable from data mining include

associations, sequences, classifications, clusters, and forecasts.

A computer system organizes data in a hierarchy that starts with

bits and bytes and progresses to fields, records, files, and databases

Sentiment Analysis

can mine text comments in an email message, blog, social media conversation, or survey form to detect favorable and unfavorable opinions about specific subjects

DBMS includes

capabilities and tools for organizing, managing, and accessing the data in the database The most important are its data definition capability, data dictionary, and data manipulation language

data definition

capability to specify the structure of the content of the database used to create database tables and to define the characteristics of the fields in each table

entity

categories representing a person, place, or thing on which we store information

attributes

characteristics

join operation

combines relational tables to provide the user with more information than is available in individual tables.

select operation

creates a subset consisting of all records in the file that meet stated criteria.

project operation

creates a subset consisting of columns in a table, permitting the user to create new tables that contain only the information required

the DBMS often resides on a dedicated computer called a

database server

blockchain

distributed database technology that enables firms and organizations to create and verify transactions on a network nearly instantaneously without a central authority. The system stores transactions as a distributed ledger among a network of computers. The information held in the database is continually reconciled by the computers in the network.

data governance

encompasses policies and procedures through which data can be managed as an organizational resource

Companies often build

enterprise-wide data warehouses, where a central data warehouse serves the entire organization, or they create smaller, decentralized warehouses called data marts.

There are a number of advantages to using the web to access an organization's internal databases

everyone knows how to use web browser software, and employees require much less training than if they used proprietary query tools. the web interface requires few or no changes to the internal database

analytical platforms

feature preconfigured hardware-software systems that are specifically designed for query processing and analytics

Data​ ________ is important because it establishes an​ organization's rules for​ sharing, disseminating,​ acquiring, standardizing, and classifying data.

governance

database

group of related files

handling unstructured and semistructured data in vast quantities, as well as structured data, organizations are using

hadoop

key field

identifies each record so that the record can be retrieved, updated, or sorted

sequences

linked over time

associations

occurrences linked to a single event.

distributed database

one that is stored in multiple physical locations. Parts or copies of the database are physically stored in one location and other parts or copies are maintained in other locations.

relational databases

organize data into two-dimensional tables (called relations) with columns and rows; most common type of database; Each table contains data about an entity and its attributes

logical view

presents data as end users or business specialists would perceive them

Each table in a relational database has one field designated as its

primary key; the unique identifier for all the information in any row of the table; cannot be duplicated

Data mining

provides insights into corporate data that cannot be obtained with OLAP by finding hidden patterns and relationships in large databases and inferring rules from them to predict future behavior

classification

recognizes patterns that describe the group to which an item belongs by examining existing items that have been classified and by inferring a set of rules

bit

represents the smallest unit of data a computer can handle

Referential Integrity

rules to ensure that relationships between coupled tables remain consistent.

physical view

shows how data are actually organized and structured on physical storage media, such as a hard disk.

Database Management System (DBMS)

specific type of software for creating, storing, organizing, and accessing data from a database relieves the end user or programmer from the task of understanding where and how the data are actually stored by separating the logical and physical views of the data

OLAP (online analytical processing)

supports multidimensional data analysis, enabling users to view the same data in different ways using multiple dimensions. Each aspect of information—product, pricing, cost, region, or time period—represents a different dimension.

Hadoop consists of several key services

the Hadoop Distributed File System (HDFS) for data storage and MapReduce for high-performance parallel data processing

Big data is often characterized by the "3Vs

the extreme volume of data, the wide variety of data types and sources, and the velocity at which the data must be processed

NoSQL

use a more flexible data model and are designed for managing large data sets across many distributed machines and for easily scaling up or down. They are useful for accelerating simple queries against large volumes of structured and unstructured data, including web, social media, graphics, and other forms of data that are difficult to analyze with traditional SQL-based tools.

Data Manipulation Language (DML)

used to add, change, delete, and retrieve the data in the database contains commands that permit end users and programming specialists to extract data from the database to satisfy information requests and develop applications

forecasting

uses a series of existing values to forecast what other values will be

Cloud-based data management services have special appeal for

web-focused startups or small to medium-sized businesses seeking database capabilities at a lower cost than in-house database products.

in-memory computing

which relies primarily on a computer's main memory (RAM) for data storage. (Conventional DBMS use disk storage systems.) Users access data stored in system's primary memory, thereby eliminating bottlenecks from retrieving and reading data in a traditional, disk-based database and dramatically shortening query response times.

clustering

works in a manner similar to classification when no groups have yet been defined

true statements about big data

​"Big data" data sets are at least a petabyte in size. Big data can consist of multimedia files like​ graphics, audio, and video. Big data has a variety of data with structured data and​ free-form text and logs. it is generated rapidly


Kaugnay na mga set ng pag-aaral

Unit 10 World War 2 & the Cold War.

View Set

Chapter 13 Developmental Psychology

View Set

Chapter 4 World History LESSON 3

View Set