Chapter 6- CIS
In-memory computing
used in big data analysis, uses computers main memory (RAM) for data storage to avoid delays in retrieving data from disk storage, can reduce hours/days of processing to seconds, requires optimized hardware
Challenge of big data
volumes too great for typical DBMS (petabytes, exabytes of data) can reveal more patterns, relationships, and anomalies, requires new tools and technologies to manage and analyze
Databases in the cloud
Appeal to start-ups, smaller businesses, Amazon Relational Database Service, Microsoft SQL Azure, private clouds
Key services of Hadoop
Hadoop Distributed File System (HDFS), MapReduce (breaks data into clusters for work), Hbase: NoSQL database
Field
IS 101 (Course Field)
Rows (tuples)
Records for different entities
Fields (columns)
Represents attribute for entity
Operations of a Relational DBMS
Select, join, project
Join
combines relational tables to provide user with more information than available in individual tables
Select
creates a subset of data of all records that meet stated criteria
Project
creates subset of columns in table, creating tables with only the information specified
DBMS
database management systems
Attribute
each characteristic or quality describing entity
Hadoop
enables distributed parallel processing of big data across inexpensive computers
Primary key
field in table used for key fields
Key field
field used to uniquely identify each record
Problems with the traditional file environment
files maintained separately by different departments, data redundancy, data inconsistency, program-data dependence, lack of flexibility, poor security, lack of data sharing availability
Table
grid of columns and rows
Field
group of characters as word(s) or number(s)
File
group of records of same type
Record
group of related fields
Relational DBMS
represent data as two-dimensional tables, each table contains data on entity and attributes
Analytic platforms
High-speed platforms using both relational and non-relational tools optimized for large data sets
Data warehouse
Stores current and historical data from many core operational transaction systems, consolidates and standardizes information for use across enterprise but data cannot be altered, provides analysis and reporting tools
Database
group of related files
DBMS
interfaces between applications and physical data, separates logical and physical views of data, solves problems of traditional file environment
Non-relational databases
more flexible data model, data sets stored across distributed machines, easier to scale, handle large volumes of unstructured and structured data
Primary key
part number
Entity
person, place, thing, on which we store information
Foreign key
primary key used in second table as look-up field to identify records from original table
Database
serves many applications by centralizing data and controlling redundant data
Normalization
streamlining complex groupings of data to minimize redundant data elements and awkward many-to-many relationships
File
student ID, course, date, grade
Record
student ID, course, date, grade
Data marts
subset of data warehouse, typically focus on single subject or line of business
Foreign key
supplier number