Chapter 6
Normalization
the process of streamlining complex groups of data to minimize redundant elements and awkward many-to-many relationships and increase stability and flexibility
File Organization Hierarchy/Concepts
A computer system organizes data in a hierarchy that starts with bits and bytes and progresses to fields, records,files, and databases.
Online analytical processing (OLAP)
OLAP supports multidimensional data analysis, enabling users to view the same data in different ways using multiple dimensions. Each aspect of information-product, pricing, cost, region, or time period-represents a different dimension.
File
a groups of records of the same type
Data cleansing
also known as data scrubbing, consists of activities for detecting and correcting data in a database that are incorrect, incomplete, improperly formatted, or redundant. data cleansing not only corrects data but also enforces consistency among different sets of data that originated in separate information systems. specialized data-cleansing software is avaible to survey data files automatically, correct errors in the data, and integrate the data in a consistent, company-wide format
Data Dictionary
an automated or manual file that stores definitions of data elements and their characteristics (Microsoft Access)
Data Definition
capability to spicy the structure of the content of the database, it would be used to create database tables and to define the characteristics of the fields in each table
Big Data
data sets with volumes so huge that they are beyond the ability of typical DMBS to capture, store and analyze
Analytic Platforms
developed by commercial database vendors- is a specialized high-speed platform using both relational and non relational technology that is optimized for analyzing large data sets (IBM PureData System for analytics)
Entity
each of the generalized categories representing a person, place, or thing in which we store information
Database server
in a client/server environment, the DBMS resides on a dedicated computer called a database server. The DBMS receives the SQL requests and provides the required data. the information is transferred from the organization's internal database back to the web server for delivery in the form of a web page to the user
Data quality audit
is a structured survey of the accuracy and level of completeness of the data in an information system. data quality audits can be performed by surveying entire data files, surveying samples from data files, or surveying end users for their perceptions of data quality
Data mart
is a subset of a data warehouse, in which as summarized or highly focused portion of the organization/s data is placed in a separate database for a specific population of users
Hadoop
is an open-source software framework the Apache Software Foundation manages that enables distributed parallel processing of huge amounts of data across inexpensive computers. it breaks a big data problem down into subproblems, distributes them among up to thousands of inexpensive computer processing nodes, and then combines the result into a smaller data set that is easier to analyze.
Bit
represents the smallest unit of data a computer can handle
Referential integrity
rules to ensure that relationships between coupled tables remain consistent
Attributes
specific characteristics of each entity
Web mining
the discovery and analysis of useful patterns and information from the world wide we. businesses might turn to web mining to help them understand customer behavior, evaluate the effectiveness of a particular website, or quantify the success of a marketing campaign
Data manipulation language
(a DMBS specialized language) is used to add, change, delete and retrieve the data in the database. this language contains commands that permit end users and programming specialists to extract data from the database to satisfy information requests and develop applications
Nonrelational database management systems (NoSQL)
use a more flexible data model and are designed for managing large data sets across many distributed machines and for easily scaling up or down. they are useful for accelerating simple queries against large volumes of structured and unstructured data, including web, social media, graphics, and other forms of data that are difficult to analyze with traditional SQL-based tools
Entity-relationship diagram
(a schematic) clarifies table relationships in a relational database. The most important piece of information an entity-relationship diagram provides is the manner in which two tables are related to each other.
In-memeory computing
(another way of facilitating big data analysis) relies primarily on a computer's main memory (RAM) for data storage (DBMS use a disk storage systems). users access data stored in system primary memory, thereby eliminating bottlenecks from retrieving and reading data in a traditional , disk-based database and dramatically shortening query response times. In-memory processing makes it possible for very large sets of data, amounting to the size of a data mart or small data warehouse, to reside entirely in memory.
Data mining
(discovery driven) provides insights into corporate data that cannot be obtained with OLAP by finding hidden patterns and relationships in large databases and inferring rules from them to predict future behavior. The patterns and rules are used to guide decision making and forecast the effect of those decisions. The types of information obtainable from data mining include associations, sequences, classifications, clusters and forecasts.
Report generator
DBMS typically include capabilities for report generation so that the data of interest can be displayed in a more structured and polished format than would be possible just by querying. Crystal reports is a popular generator for large corporate DBMS, although it can also be used with MS Access
Tuples
The actual information about a single supplier that resides in a table is called a row. Rows are commonly referred to as records, or, in very technical terms as tuples
Data warehouse
a database that stores current and historical data of potential interest to decision makers throughout the company. The data originate in many core operational transaction systems, such as systems for sales, customer accounts, and manufacturing, and can include data from website transactions
Key field
a field that uniquely identifies each record so that the record can be retrieved, updated, or sorted
Byte
a group of bits-represents a single character which can be a letter, a number, or another symbol.
Record
a group of related fields, such as a student's ID number, the course taken, the date, and the grade,
Field
a grouping of characters into a word, a group of words, or a complete number (such as a person's name or age)
Database administration
a large org. will also have a database design and management group within the corporate information systems division that is responsible for defining and organizing the structure and content of the database an maintaining it. in close cooperation with users, the design group establishes the physical database, the logical relations among elements, and the access rules and security procedures. the functions it performs are called database administration
Sentiment analysis
a software that can mine text comments in an email message, blog, social media conversation, or survey form to detect favorable and unfavorable opinions about specific subjects.
Database Management System (DBMS)
a specific type of software for creating, storing, organizing, and accessing data from a database (MS Access for desktops systems whereas DB2, Oracle, and SQL are for large mainframes and midrange computers)
Query
is a request for data from a database
Foreign key
is essentially a look-up field to find data about the supplier of a specific part.
Data administration
is responsible for the specific policies and procedures through which data can be managed as an organizational resources. these responsibilities include developing information policy, planning for data, overseeing logical database design and data dictionary development , and monitoring how information systems specialist and end-user groups use data
Relational database
most common type of database today: organize data into two-dimensional tables (called relations) with columns and rows.
Structured Query Language (SQL)
most prominent data manipulation language today
Information policy
specifies the organization's rules for sharing, disseminating, acquiring, standardizing, classifying, and inventorying information. information policies identify which users and organizational units can share information, where information can be distributed, and who is responsible for updating and maintaining the information
Primary key
this key field is the unique identifier for all the information in any row of the table, and this primary key cannot be duplicated (each table in a relational database has one field designated as its primary key)
Text mining
unstructured data, most in the form of text files, is believed to account for more that 80 percent of useful organization information and is one of the major sources of big data that firms want to analyze. Email, memos, call center transcripts, survey responses, legal cases, patent descriptions, and service reports are all valuable for finding patterns and trends that will help employees make better business decisions. Text mining tools can extract key elements from unstructured big data sets, discover patterns and relationships and summarize the information