Ch 6
fields
(columns) store data representing an attribute
Entity
Generalized category representing person, place, thing on which we store and maintain information, i.e. SUPPLIER or PART
Hadoop
Open-source software framework that enables distributed parallel processing of huge amounts of data across many inexpensive computers. Breaks big data down into smaller data sets that are easier to analyze
Attributes
Specific characteristics of each entity: SUPPLIER name, address PART description, unit price, supplier
Which of the following best illustrates the relationship between entities and attributes?
The entity CUSTOMER with the attribute ADDRESS
Database
a collection of related files containing records on people, places, or things
A schematic of the entire database that describes the relationships in a database is called a(n):
entity-relationship diagram
In-Memory Computing
relies on computer's main memory (RAM) for physical data storage, eliminates bottlenecks, shortens query response times
In a table for customers, the information about a single customer would reside in a single:
row
Operations of a Relational DBMS
• select - creates a subset of all records meeting stated criteria • join - combines relational tables to present the server with more information than is available from individual tables • project - creates a subset consisting of columns in a table, permits user to create new tables containing only desired information
Query and Reporting
-Data Manipulation Language: Used to add, change, delete, retrieve data from database. Example: Structured Query Language (SQL) -DBMS typically have report generation capabilities for creating polished reports.
Data Marts
-Subset of data warehouse -Summarized or highly focused portion of firm's data for use by specific population of users -Typically focuses on single subject or line of business
BI Infrastructure
-data warehouses -data marts -Hadoop -In-Memory Computing -analytical platforms
Challenges of Big Data
-massive quantities of unstructured and semi-structured data from the internet - Volume, Velocity, Variety -big datasets offer more patterns/insights than smaller datasets
Blockchain
A distributed and decentralized ledger that records and verifies transactions and ownership, enables firms to create/verify transactions on a network
distributed database
A logically related database that is stored over two or more physically independent sites.
foreign key
A primary key of one table that appears as an attribute in another table and acts to provide a logical relationship between the two tables
Capabilities of DBMS
Data definition capabilities, Data dictionary, querying and reporting
database administration
Database design and management group responsible for defining and organizing the structure and content of the database, and maintaining the database.
Data Warehouses
Database that stores current and historical data that may be of interest to decision makers, data can be accessed but not altered
data mining
Finds hidden patterns and relationships in large databases and infers rules from them to predict future behavior
Non-Relational Databases
Handle large data sets of data that are not easily organized into tables, columns, and rows, can handle unstructured data like Amazon's Simple DB
data administration
Responsible for specific policies and procedures through which data can be managed as a resource
A foreign key is a field that links to a separate table.
True
Every record in a file must contain at least one key field.
True
text mining
Unstructured data (mostly text files) accounts for 80 percent of an organization's useful information. sentiment analysis of online comments etc
Data Dictionary
automated or manual file storing definitions of data elements and their characteristics
The smallest unit of data a computer can handle is called a:
bit
primary key
can't be duplicated, changed, or used for anything else. a unique identifier
Database Management System (DBMS)
creates, reads, updates, and deletes data in a database while controlling access and security
web mining
discovery and analysis of useful patterns and information from the World Wide Web, content mining (of websites), structure mining (like links), usage mining (user interactions gathered by web servers)
physical view
how data is actually structured and organized
The process of streamlining data to minimize redundancy and awkward many-to-many relationships is called:
normalization
In a business relational database, tables contain:
one table for each entity
relational databases
organize data into 2d tables (relations) and rows, all entities related in process of action
The logical view of a database:
presents data as they would be perceived by end users.
A field identified in a table as holding the unique identifier of the table's records is called the:
primary key
Data Definition Capabilities
specify structure of content of database, you describing what each attribute (column header) specifies
information policy
state's org's rules for organizing, managing, storing, and sharing info
rows
store data for separate records, or tuples. info about a single customer in table for customers reside in a row
Online Analytical Processing (OLAP)
supports multidimensional data analysis, enabling users to view the same data in different ways using multiple dimensions
key field
uniquely identifies each record
analytic platforms
use both relational and non-relational technology that are optimized for analyzing large datasets. include in-memory systems, no SQL DBMS, data lakes,
business intelligence
what you do with the data you have