Lamar Database Design Mid-Term
Types of No-SQL Database Systems
*Key-Value Pair Systems *Column-Oriented Table Systems *Document-Oriented System *Graph-Oriented System
Ordered by Field Value
<field value,pointer to record>
Schema Diagram
A diagrammatic display of a database schema
What is a file system:
A set of files is called a file system
What are different types of threat (provide examples)
Accidental: Improper access, unable to recover, Concurrent collisions, system errors, hardware failure. Malicious: Bypass security, Unauthorized access, information give under duress or for personal gain, hacking, programming backdoors.
Volume
Amount of Data in System
UOD or Universe of Discourse
Aspects of the real world. Changes to DB suppose to reflect changes in UoD
Deference Between B-Tree and B+Tree
B-tree pointers at all levels B+tree pointers at leaf level
A copy of the database made to be used in case of a failure is called a: a)Recovery Log b)Security System c)Transaction Log d)Backup
Backup
Column-Oriented Table
Based on the approach of BigTable. FLEXABLE
key-value pair
Builds on the key-value pair approach of HDFS
What is the difference between clustering index and secondary index
Clustering index is on field specifying sequential order of file, there can only be one clustering index while secondary can have many indices.
Database System
Computerized record system
What is knowledge
Consists of patterns that transfer data into information.
Multi-Level Indexes
Creates an index to the index
Self-Describing nature of a database system:
DBMS catalog stores the description of the database (called meta-data), allows DBMS software to work with different databases
Advantages of using database approach
Data Independence, redundancy, sharing data, unauthorized access restricting, persistent storage, efficient query processing, backups, muliple interfaces, represents complex relationships, enforces integrity, draws inferences
Document-Oriented System
Data in form known as document. XML or JSON. Doc dont need same structure
Graph-Oriented System
Data in nautrial hierarchical or network sturcture, uses graph to traversal techniques
Database system components
Data, Hardware, Software and Users
In the object-oriented model, one important difference between program objects and database objects is that: a)Database objects are encapsulated b)database object persist c)database object have identifiers d)database objects have state and behavior
Database objects are persist
Quality of Information
Depends on accuracy, Timeliness, and Relevancy
Veracity
Detecting and correcting noise and inconsistent data
Three-Schema Architecture
External Level (end-users), Conceptual Level (Conceptual Schema), Internal Level (Internal Schema/Stored Database)
Computerized record-keeping
File system and Database Management System (DBMS)
What is a file system and what are the drawbacks of a file system?
File system is a set of files. Drawbacks: redundancy, inconsistency, lack of data independence, programs are needed for each file.
File System Vs Database Approach
File system requires programming, database does not require programming (has reports, queries and generic functionality)
Which of the following storage technologies first offered directed access to stored records? a)paper tape b)magnetic tape c)magnetic disk d)punch cards
Magnetic Disk
Reason some Data presentations better than others
More consistent, less redundancy, Better Translation into real world, better constraints, generic terms of physical access
Drawbacks of file system
No data independence, no robust, efficient for a specific applications
Faster development of new application is often possible in a database environment because: a)No file create phase is needed b)New applications are independent of old ones c)No program planning is required d)The database administrator does all programming
No file create phase is needed
Emerging Classificaiton of DBMS
Object-oriented, Object-relational
Manual record-keeping
Paper System - Filing Cabinets
Database System Utilites
Perform certain functions: loading data into database, backing up, reorginizing, report generation, monitor performance, user monitoring, compress data, sort, ect.
Single-Level Indexes
Primary index, clustering index, secondary index
Data
Raw facts -tables as they are stored without headings showing what the mean
Tradtional Classification of DBMSs
Relational, Network, Hierarchical
A description of the structure of a database is called a(n): a)outline b)skeleton c)schema d)view
Schema
Main Characteristics of Database Approach
Self-describing nature of a database (catalog), supports multiple views of data, sharing of data, insulation between programs and data
One-Tier Architecture
Server handles all functionality
Which of the following main characteristics of database approach is not presented by Three-Schema Architecture: a)Self-describing nature of the database system(catalog) b)insulation between programs and data and data abstraction c)Support of mulitple views of the data d)Sharing of data and multi-user transaction processing
Sharing of data and multi-user transaction processing
Variety
Structured and Unstructed data. makes it possible to gather new insights when using data together
Velocity
Time Sensitive processes
Why do we need Databases?
Too much data, Sharing Data, Security Control, Recoverablility, Integrity, Applications are independent from data, Platform and HW are indpendent
Analytical Tools
Tools used to enable users to turn knowledge into information
The mini-world that is represented in a database is also called the: a)conceptual model b)universe of discourse c)metadata d)logical model
Universe of Discourse or UoD
Which one of the V's of Big data revers to questions about data correctness and data validity? a)Value b)Variety c)velocity d)veracity
Veracity
Big Data Characteristics
Volume Variety Velocity Veracity Value
B-Tree/B+Tree
allows efficient insertion and deletion of new search values. Each node corresponds to a disk block
Which of the following is an example of data rather than information: a)student grade report b)an alphabetical class roster c)an unordered list of grades for a test d)a faculty schedule
an unordered list of grades for a test
Describe the three-level architecture for databases
architecture consists of the external, logical and internal levels.
Index
auxiliary files that makes searches easier
A copy of the database made to be used in case of failure is called a: a)recovery log b)security system c)transaction log d)backup
backup
Logical Data Independence
capacity to change conceptual schecma without having to change the external schema
Physical Data Independence
capacity to change the internal schema without having to change the conceptual schema.
What is information
data put into meaningful and useful context, communicated to recipients
Big Data
data sets that are too large and too complex for traditional data processing applicaitons
Secondary
defined on key field. includes one entry for each record. unordered data file, defined on secondary key field or non-key field
Internal Schema
describe physical storage and access paths
Conceptual schema
describe the structure and constraints for the whole database for community of users
External Schema
describes the various user views, most likely uses the same data model as the conceptual level.
User Interface Design
different user screens, User interfaces are id'ed during the requirements collection and analysis stage, two types of interfaces one for admins and one for all other users
Support multiple views of data:
each user may see different views that match user interest
Hashed Files
file blocks divided into N-equal-sized buckets
Suppose the size of the table is 13. What is the probe sequence for the key 8 using double hashing method: h(k,i)=k%13; h'(k)=11-k%11?
h(8)=8%13=8; h'(8)=(11-8)%11=3, after first collision key 8 will index at 8+3 or 11
Data models
have concept of schema and instance
Conceptual data model
high-level semantic, how user percieves data. entities, attributes and relationships. entity based or object based
What are the "meta-data"
identities of data elements, data element lengths, data types where the elements can be found logically
Mappings
in schema levels are needed to transform request and data. Allows data independence.
DBMS languages
in some DBMS stroage definition language (SDL) and view definition language (VDL) used to define internal and external schemas
Instance
is the data in the database at a particular time
What is data
known facts that can be recorded and have implicit meaning
Physical data model
low level, how the data is stored on computer
All of the following are potential advantages of using database except: a)control of redundancy b)improved data integrity c)lower cost d)better data consistency
lower cose
User Interfaces
menu-based (web), form-based (naive users), graphical based (point and click, drag and drop), Natural language (written in english), Interfaces with dba, other combinations of all interfaces
Schema
meta-data or data describing data, specified during the designed, not changed a lot
Faster development of new applications is often possible in a database environment because: a) no file creation phase is needed b)new applications are independent of old ones c)no program planning is requried d)the database administrator does all programming
no file creation phase is needed
Faster development of new applications is often possible in a database environment because: a)no file creation phase is needed b)new applicaitons are independent of old ones c)no program planning is required d)the database administrator does all programming
no file creation phase is needed
Primary Index
one index for each block of the data file, defined on primary key field
Clustering Index
one index for each distinct value, index entry points to first data block that contains records, ordered on non-key field
In a business, if a department can have many workers, but each worker belongs to only one department, the cardinality of the department-to-worker relationship is: a)one to one b)one to many c)many to one d)many to many
one to many
Data Model operations
operations for retrievals and updates referring to the concepts of the data model
If entity instances can belong to more than one subclass in specialization the sub-classes are? a)disjointed b)overlapping c)incorrect d)complete
overlapping
Unordered Files
pile or heap, records put at end
When is preferable to use a dense index rathar then a sparse index
preferable when file is not sorted on indexed field or index files is small
Information
processed data, useful for decision-making, formatted report using database
Implementation data model
representational, falls between for conceptual and physical data models
The goal of the three-schema architecture
separate user applications and physical database
What is the goal of Three-Schema Architecture and Data Independence?
separates user applications and the physical database. To provide independence and insulation of data presentation from changes to application programs
What is a file
sequence of records where each record is a collection of data values
Ordered Files
sequential files, sorted by values, binary search to find
Data Model
set of concepts to describe the structure and contraints the database should obey
Issues with Manual Record-Keeping
slow, keep data in order, expensive, no remote access, no sharing, only one copy-no backup
A database is best used to support which type of data, choose all that apply: a)static current data b)dynamic current data c)static historical data d)dynamic historical data
static and dynamic current data
Spanned Record
stored in more than one block
Unspanned Record
stored in only one black
NoSQL databases are used mainly for handling large volumes of ____________ data? a)unstructured b)structured c)Semi-structured d)all of the above
unstructured
Data Definition Language (DDL)
used by DBA to define the conceptual schema also used to define internal and external schema (views)
Data Manipulation Language (DML)
used to specify database retrievals and updates. Stand-alone can be applied directly (query language) Data Sublanguage can be embedded in general-purpose