S307 Midterm
Main difference between file-based system vs. database system?
File-based system has no DBMS; front-end responsible for interactions, back-end responsible for data
In a file processing system environment, descriptions for data and logic for accessing the data is built into:
Files
Predictive Analytics
extracts information from data and uses it to predict future trends and identify behavioral patterns
Attribute
field, piece of data stored in table
Primary Key
relational database table column (or combination of columns) designated to uniquely identify all table records
ER Diagram (Schema)
results from an ER model that is straightforward to explain to the users and therefore can be used as a communication tool between the designers and the users
Relation
Mathematical term for table
Output of Requirements Analysis:
Meeting w/ client; specifics, notes
Functional Decomposition
Meeting with client; taking notes
Output of Logical Design:
Nitty-gritty details; where everything will go
3 Rules about Keys
No two rows can have the same primary key No primary key can be null
For the businesses utilizing Big Data tools effectively, which one of the following options is the analytics method that is mostly used by many organizations in order to increase their business revenue?
Predictive Analytics
3 Phases of SDLC that are DBMS independent:
Requirements Analysis Conceptual Design Logical Design (doesn't matter what technology you choose)
6 Phases of SDLC:
Requirements Analysis Conceptual Design Logical Design Physical Design Implementation Maintenance
Tuple
Row of data
Record
Row; Logically connected set of one or more fields
Total Specialization
Specifies that each entity instance of the supertype must be a member of some subtype in the relationship. D for Disjoint = each subtype is distinct
Physical Design
Take model and build into table in DBMS
Minimum Viable Product
The most pared down version of a product that can still be released. The feedback from the users guide the future development of the product.
composite primary key
The primary key formed by combining two or more columns in a table.
Normalization
The process of applying rules to a database design to ensure that information is sound and free of undesirable characteristics - inconsistency - data redundancy - Issues with CRUD Which attributes should be grouped together in relation
TRUE OR FALSE: Systems Development Lifecycle (SDLC) is the traditional methodology used to design, develop, and maintain systems..
True
The Five V's of Big Data
Volume - large amounts Variety - different data Velocity - data is fast Veracity - how do you verify? Value - valuable to company & profit
Super Entity and Aub Entiry Steps:
Whatever the primary key is of Super Entity will now become primary key of sub entity
The datanode and namenode are respectively
Worker and master nodes
Agile
a approach that employs incremental and iterative work beats; incremental software development
Associative Identity
an entity type that associates the instances of one or more entity types, and contains attributes that are peculiar to the relationship between those entity instances
Derived Attribute
attributes for which the value is derived or calculated from base attributes - ex) date of birth of an employee is the base attribute and the age is calculated from that
What tools do we use to store and retrieve big data?
- Hadoop - MongoDB - Google Big Query
What are the 5 problems of file-based systems?
- Limited data sharing - Program data dependence - Duplication of data - Lengthy development times - Excessive program maintenance
What do we do with Big Data?
- Nothing - Study & gain insights (mining) - Sell it, share it (Amazon, Google, etc.)
2nd Normal Form
- Table is in 1NF - Removes partial dependencies All non-prime attributes are functionally dependent on the whole candidate key
Many-to-Many relationship steps:
1. Put table between them 2. Both outside tables give up primary key as foreign key to middle table
When did P. Chen propose Entity-Relationship?
1970
When did EF Codd publish paper of use of relational database model?
1970-1972
When did SQL become the Standard Query Language?
1980s
Output of Conceptual Design:
1st rough draft concept model; identify people involved, items involved, products sold
Characteristics of Relational Data Model: - A table is a _______________ structure composed of ______ and ______. - Each row is a _________ within _________. - Each column represents an __________ and has a _________ name. - Each row-column __________ represents a _________ value. - A table has a __________ key that ________ identifies a row. - All values must conform to the __________ data format. - Order of rows and columns is __________.
2-dimensionl; rows (records); columns (attributes) single entity; entity set attribute, distinct intersection; single data primary key; uniquely same (by design) (rejects bad data) Insignificant
3rd Normal Form
2NF and no transitive dependencies All columns depend directly on primary key
Binary Relationship
A relationship between the instances of two entity types.
Referential Integrity
A rule that prevents orphaned records.
Surrogate Key
A system-assigned primary key, generally numeric and auto-incremented.
Multi-value Attribute
A type of attribute that can have more than one value at one timeex) multiple phone numbers - work, cell, home
Integrity
Accuracy and Correctness; DBMS should automatically enforce integrity constraints
Questions when designing: Can I:
Add Update Delete
Single-value Attribute
Attributes that can have single value at a particular instance in timeex) A person can't have more than one age value
A database is an organized collection of___________________related data.
Logically
Logical Design
Capture every individual piece of data needed and represent it; lay framework for primary and foreign keys
Output of Physical Design:
Coding Process
Big Data
Collection of data from traditional and digital sources inside and outside your company that can always be discovered
File
Collection of related records
Domains
Column
The SDLC phase in which diagrams are drawn to show entities and relationships is called the _______________ phase
Conceptual
3 Stages of Design in SDLC:
Conceptual Logical Physical
A graphical system used to capture the nature and relationships among data is called a(n):
Data
Unstructured Data
Data hard to model or store the same way we store traditional structured data. Data collected from social media is the largest source. Very text heavy. Posts, likes, videos, etc.
The conceptual schema is always technology specific.
False
The software that is used to create, maintain and provide controlled access to databases?
Database Management System
What do we call the diagrams we draw to show entities and their attributes and relationships between those entities:
ER Diagrams
A person, place, object, event, or concept about which the organization wishes to maintain data is called a(n):
Entity
Field
Group of characteristics with meaning; column
What is the the signal that is sent by the datanode to the namenode after the regular interval in time to indicate its presence?
Heartbeat
Foreign Key
How tables communicate with each other; a field (or collection of fields) in one table that uniquely identifies a row of another table; this type of key is defined in a second table, but it refers to the primary key in the first table
The SDLC phase in which data processing programs are coded and tested is called the _______________ phase
Implementation; yield a database
Older systems that often contain data of poor quality are called ________ systems.
Legacy
Domain Constraint
column values must be in a given set of specific values
Multi-structured Data
data in a wide variety of formats and forms. Data collected from people and machines. Data collected on cellphone. Operational database, transactional database, TRACKING THINGS IN REAL TIME - social media tags, etc. STRUCTURED, STRICT
Metadata
details, descriptions of data (data about data)
Transaction Table
holds a list of a particular kind of event which we want to record - Changes frequently
What kinds of relationships are not possible in a relational database?
many-to-many
In Logical Design ________attributes get their own table
multi-value
To link tables, you need to add the _______ of one table to the other as a _________.
primary key foreign key
1st Normal Form
table free from repeating groups - field value cannot be decomposed into smaller parts
Descriptive Analytics
the use of data to understand past and current business performance and make informed decisions
Prescriptive Analytics
tools, optimization that create models indicating the best decision to make or course of action to take
A data warehouse derives its data from:
various operational data sources