Chapter 6

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

Normalization

the process of streamlining complex groups of data to minimize redundant elements and awkward many-to-many relationships and increase stability and flexibility

File Organization Hierarchy/Concepts

A computer system organizes data in a hierarchy that starts with bits and bytes and progresses to fields, records,files, and databases.

Online analytical processing (OLAP)

OLAP supports multidimensional data analysis, enabling users to view the same data in different ways using multiple dimensions. Each aspect of information-product, pricing, cost, region, or time period-represents a different dimension.

File

a groups of records of the same type

Data cleansing

also known as data scrubbing, consists of activities for detecting and correcting data in a database that are incorrect, incomplete, improperly formatted, or redundant. data cleansing not only corrects data but also enforces consistency among different sets of data that originated in separate information systems. specialized data-cleansing software is avaible to survey data files automatically, correct errors in the data, and integrate the data in a consistent, company-wide format

Data Dictionary

an automated or manual file that stores definitions of data elements and their characteristics (Microsoft Access)

Data Definition

capability to spicy the structure of the content of the database, it would be used to create database tables and to define the characteristics of the fields in each table

Big Data

data sets with volumes so huge that they are beyond the ability of typical DMBS to capture, store and analyze

Analytic Platforms

developed by commercial database vendors- is a specialized high-speed platform using both relational and non relational technology that is optimized for analyzing large data sets (IBM PureData System for analytics)

Entity

each of the generalized categories representing a person, place, or thing in which we store information

Database server

in a client/server environment, the DBMS resides on a dedicated computer called a database server. The DBMS receives the SQL requests and provides the required data. the information is transferred from the organization's internal database back to the web server for delivery in the form of a web page to the user

Data quality audit

is a structured survey of the accuracy and level of completeness of the data in an information system. data quality audits can be performed by surveying entire data files, surveying samples from data files, or surveying end users for their perceptions of data quality

Data mart

is a subset of a data warehouse, in which as summarized or highly focused portion of the organization/s data is placed in a separate database for a specific population of users

Hadoop

is an open-source software framework the Apache Software Foundation manages that enables distributed parallel processing of huge amounts of data across inexpensive computers. it breaks a big data problem down into subproblems, distributes them among up to thousands of inexpensive computer processing nodes, and then combines the result into a smaller data set that is easier to analyze.

Bit

represents the smallest unit of data a computer can handle

Referential integrity

rules to ensure that relationships between coupled tables remain consistent

Attributes

specific characteristics of each entity

Web mining

the discovery and analysis of useful patterns and information from the world wide we. businesses might turn to web mining to help them understand customer behavior, evaluate the effectiveness of a particular website, or quantify the success of a marketing campaign

Data manipulation language

(a DMBS specialized language) is used to add, change, delete and retrieve the data in the database. this language contains commands that permit end users and programming specialists to extract data from the database to satisfy information requests and develop applications

Nonrelational database management systems (NoSQL)

use a more flexible data model and are designed for managing large data sets across many distributed machines and for easily scaling up or down. they are useful for accelerating simple queries against large volumes of structured and unstructured data, including web, social media, graphics, and other forms of data that are difficult to analyze with traditional SQL-based tools

Entity-relationship diagram

(a schematic) clarifies table relationships in a relational database. The most important piece of information an entity-relationship diagram provides is the manner in which two tables are related to each other.

In-memeory computing

(another way of facilitating big data analysis) relies primarily on a computer's main memory (RAM) for data storage (DBMS use a disk storage systems). users access data stored in system primary memory, thereby eliminating bottlenecks from retrieving and reading data in a traditional , disk-based database and dramatically shortening query response times. In-memory processing makes it possible for very large sets of data, amounting to the size of a data mart or small data warehouse, to reside entirely in memory.

Data mining

(discovery driven) provides insights into corporate data that cannot be obtained with OLAP by finding hidden patterns and relationships in large databases and inferring rules from them to predict future behavior. The patterns and rules are used to guide decision making and forecast the effect of those decisions. The types of information obtainable from data mining include associations, sequences, classifications, clusters and forecasts.

Report generator

DBMS typically include capabilities for report generation so that the data of interest can be displayed in a more structured and polished format than would be possible just by querying. Crystal reports is a popular generator for large corporate DBMS, although it can also be used with MS Access

Tuples

The actual information about a single supplier that resides in a table is called a row. Rows are commonly referred to as records, or, in very technical terms as tuples

Data warehouse

a database that stores current and historical data of potential interest to decision makers throughout the company. The data originate in many core operational transaction systems, such as systems for sales, customer accounts, and manufacturing, and can include data from website transactions

Key field

a field that uniquely identifies each record so that the record can be retrieved, updated, or sorted

Byte

a group of bits-represents a single character which can be a letter, a number, or another symbol.

Record

a group of related fields, such as a student's ID number, the course taken, the date, and the grade,

Field

a grouping of characters into a word, a group of words, or a complete number (such as a person's name or age)

Database administration

a large org. will also have a database design and management group within the corporate information systems division that is responsible for defining and organizing the structure and content of the database an maintaining it. in close cooperation with users, the design group establishes the physical database, the logical relations among elements, and the access rules and security procedures. the functions it performs are called database administration

Sentiment analysis

a software that can mine text comments in an email message, blog, social media conversation, or survey form to detect favorable and unfavorable opinions about specific subjects.

Database Management System (DBMS)

a specific type of software for creating, storing, organizing, and accessing data from a database (MS Access for desktops systems whereas DB2, Oracle, and SQL are for large mainframes and midrange computers)

Query

is a request for data from a database

Foreign key

is essentially a look-up field to find data about the supplier of a specific part.

Data administration

is responsible for the specific policies and procedures through which data can be managed as an organizational resources. these responsibilities include developing information policy, planning for data, overseeing logical database design and data dictionary development , and monitoring how information systems specialist and end-user groups use data

Relational database

most common type of database today: organize data into two-dimensional tables (called relations) with columns and rows.

Structured Query Language (SQL)

most prominent data manipulation language today

Information policy

specifies the organization's rules for sharing, disseminating, acquiring, standardizing, classifying, and inventorying information. information policies identify which users and organizational units can share information, where information can be distributed, and who is responsible for updating and maintaining the information

Primary key

this key field is the unique identifier for all the information in any row of the table, and this primary key cannot be duplicated (each table in a relational database has one field designated as its primary key)

Text mining

unstructured data, most in the form of text files, is believed to account for more that 80 percent of useful organization information and is one of the major sources of big data that firms want to analyze. Email, memos, call center transcripts, survey responses, legal cases, patent descriptions, and service reports are all valuable for finding patterns and trends that will help employees make better business decisions. Text mining tools can extract key elements from unstructured big data sets, discover patterns and relationships and summarize the information


संबंधित स्टडी सेट्स

Explain the difference between a biotic factor and abiotic.

View Set

Chapter 7 Exam (Volcano Case Histories:Killer Events)

View Set

CISSP(Ted) - Chapter 1 Questions and Answers

View Set

Series 66 - State Securities Administrator: The Uniform Securities Act

View Set