FINAL REVIEW Chapter 1 - MONGO DB

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Optional attribute:

Attribute does not require a value and can be left empty

Transitive Dependency

Attribute is dependent on a non-key attribute

MapReduce + Hadoop =

Big Data core technology

Business Rules

Brief, Precise description of a policy, procedure, or principle of a business.

Atomicity

Characteristic of an atomic attribute

Production database

Contains the data you are using for production tasks

Operational database:

Designed to support a company's day to day operations

MongoDB (Drop Command)

Drops a collection

Partial Dependency

Functional dependence in which the determinant is only part of the primary key

Business Intelligence

Information collected from multiple sources such as suppliers, customers, competitors, partners, and industries that analyzes patterns, trends, and relationships for strategic decision making

___________ allows information to be intelligently combined from two or more tables

Join

___________ consist of one or more attributes that determine other attributes

Keys

Granularity

Level of detail represented by the values stored in a table's row

Semi-Structured Data:

Lies between the lines of both data types Generated during a web transaction → Web applications JSON Data

Natural join

Links tables by selecting only the rows with common values in their common attributes

OLAP (online analytical processing)

Manipulation of information to create business intelligence in support of strategic decision making

Data Integrity Management:

Minimizes redundancy and maximizes consistency

Enterprise database

More than 50 people, supports many users across many departments

inner join

Most common type of join; includes rows in the query only when the joined field matches records in both tables.

Class

Represents an object or a set of objects that share a common structure and behavior.

INTERSECT

Returning only rows that appear in both sets query INTERSECT query

JOIN with a Lack of Common Attributes

Returns a products of the two tables

Alter Table

The SQL command used to make changes to table structure. When the command is followed by a keyword (ADD or MODIFY), it adds a column or changes column characteristics.

COMMIT statement

To commit changes made within a manual transaction to the database.

Online Analytical Processing (OLAP)

Tools for retrieving, processing, and modeling data from the data warehouse

MongoDB

an open source, document-oriented, non-relational DBMS

ROLLBACK statement

is used restore the database to its previous condition

Object-oriented database management system

A group of programs that manipulate an object-oriented database and provide a user interface and connections to other application programs.

Hadoop Distributed File System (HDFS)

A highly distributed, fault-tolerant file storage system designed to manage large amounts of data at high speeds.

JSON (JavaScript Object Notation)

A human-readable text format for data interchange that defines attributes and values in a document.

Composite Key

A primary key that consists of more than one attribute.

Database Management System (DBMS)

A program/service that creates, reads, updates, and deletes data in a database while controlling access and security

Subquery

A query (Select Statement) inside another query inner query is executed first

outer join

A relational algebra JOIN operation that produces a table in which all unmatched pairs are retained; unmatched values in the related table are left null.

Natural Join

A relational operation that yields a new table composed of only the rows with common values in their common attribute(s). -Eliminates duplicate columns

Crow's Foot notation

A representation of the entity relationship diagram that uses a three-pronged symbol to represent the "many" sides of the relationship.

Atomicity, Consistency, Isolation, Durability

A set of best practices for programmers to seek in all application or data base design

relational algebra

A set of mathematical principles that form the basis for manipulating relational table contents; the eight main functions are SELECT, PROJECT, JOIN, INTERSECT, UNION, DIFFERENCE, PRODUCT, and DIVIDE.

relational database management system

A software program in which data is organized as a collection of tables, and relationships between tables are formed through a common field.

Data warehouse:

Stores data in a format optimized for decision support

Data Dictionary:

Stores definition of data elements and their relationship Not the data itself, it describes the structure of the data.

use demo

Switch current database to demo

database instance

The actual data stored in a database at a particular moment in time

FROM clause

The clause that indicates the table from which to retrieve the specified columns

Client Node

The client node acts as the interface between the user application and the HDFS.

Transaction

is a sequence of database operations that access the database have four main properties: atomicity, consistency, isolation, and durability

1NF Characterisitcs

- All attributes are defined - There are no repeating groups in the table - All attributes are dependent on the primary key *There are some partial dependencies*

3NF characteristics

- Contains no transitive dependencies - Is in 2NF

2NF characteristics

-1NF has a composite primary key (dual) OR a single-attributed primary key - Has no partial Dependencies

Workgroup databases

50 people or less, support small number of users within a specific department

column family database

A NoSQL database model that organizes data into key-value pairs, in which the value component is composed of a set of columns that vary by row.

key-value database

A NoSQL database model that stores data as a collection of key-value pairs in which the value component is impossible to understand to the DBMS

Document Database

A NoSQL database model that stores data in key-value pairs in which the value component is composed of a tag-encoded document. documents encapsulate and convert data (or information) in some standard formats or encodings

concurrent backup

A backup that takes place while one or more users are working on a database.

Job Tracker

A central control program used to accept, distribute, monitor, and report on MapReduce processing jobs in a Hadoop environment.

MapReduce program

A composite program that consists of a Map procedure that performs filtering and sorting and a Reduce method that performs a summary operation.

BSON (Binary JSON)

A computer-readable format for data interchange that expands the JSON format to include additional data types including binary objects.

logical independence

A condition that exists when the internal model can be changed without affecting the conceptual model.

Centralized database:

A database located at a single site.

single-user database

A database that supports only one user at a time.

Conceptual Schema

A detailed, hardware independent specification of the overall structure of organizational data.

class diagram

A diagram consisting of classes (i.e., sets of objects) and relationship among the classes, usually in UML object notation

graph database

A graph database offers an alternative way to track relationships; its structure resembles sociograms with their interlinked nodes

Database schema

A graphic that documents the data model and shows the tables, attributes, keys, and logical relationships for a database.

Equijoin

A join operator that links tables based on an equality condition that compares specified columns of the tables.

Key Attribute

A key attribute is the unique characteristic of the entity. For ex. Name and hire date are attributes of the entity Employee.

prime attribute

A key attribute; that is, an attribute that is part of a key or is the whole key.

Unified Modeling Language (UML)

A language based on object-oriented concepts that provides tools such as diagrams and symbols to graphically model a system.

Structured Query Language (SQL)

A language used to create and manipulate databases.

data warehouse

A logical collection of information - gathered from many different operational databases - that supports business analysis activities and decision-making tasks

Distributed Database:

A logically related database that is stored over two or more physically independent sites.

Entity Relationship Diagram/Model

A methodology for documenting databases illustrating the relationship between various entities in the database.

Cascading "Order" Sequence

A nested ordering sequence for a set of rows, such as a list in which all last names are alphabetically ordered and, within the last names, all first names are ordered.

Recursive Query

A nested query that joins a table to itself. EX. Identifying managers of employees in EMP table

NoSQL

A nonrelational database that is better tuned for accessing large data sets. Advantages •High scalability, availability, and fault •Uses low-cost commodity hardware •Supports Big Data Disadvantages •Complex programming is required •There is no relationship support •There is no transaction integrity support

Existence independence

A strong entity that exists apart from all of its related entities

Database Security

A subset of information security that focuses on the assessment and protection of information stored in data repositories like database management systems and storage media.

Existence dependence

A weak entity that exists in the database only when it is associated with another related entity

performance tuning

Activities that make a database perform more efficiently in terms of storage and access speed.

WHERE clause

Adds conditional restrictions to the SELECT statement

Insert Statement

Adds one or more records to any single table in a relational database

Unary relationship (recursive relationship)

An association is maintained within a single entity type

primary key (PK)

An attribute or a combination of attributes that uniquely identifies an instance in a table --> cant have a null value *Derivied from candidate keys

Superkey

An attribute or attributes that uniquely identify each entity in a table. --> Sets the basis for identifying candidate keys

atomic attribute

An attribute that cannot be further subdivided to produce meaningful components. EX: Last Name

nonprime attribute

An attribute that is not part of a key.

Derived attribute:

An attribute whose values can be calculated from related attribute values. Age Attribute value is derived from Birthdate attribute

Candidate Key

An attribute(s) that is derived from the Superkey that bears no redundant attributes.

composite/associative entity

An entity with 1:M relationship with parent entities composed of primary key attributes of each parent entity

Neo4j

An organization that requires a graph database that is highly scalable would select the ________ database management system.

Single-valued attribute:

Attribute that has only a single value • e.g.) SSN, a part's serial number

Key attribute:

Attribute that is a part of a key

Multivalued attributes:

Attributes that have many values • e.g.) degrees, a car's color

incremental backup

Backup that copies only the changed data since the last backup. -To prevent redundancy since why would anyone want to back up the same data that's not changed recently.

EXCEPT (Minus)

Combines rows from two queries and returns only the rows that appear in the first set query EXCEPT query or query MINUS query

Heartbeat

Communication between nodes within HDFS to ensure a node is active

Data Definition Language (DDL)

DBMS language that builds the data dictionary, creates the database, describes logical views, and specifies record or field security constraints

Data Manipulation Language (DML)

DBMS language that changes database content, including data element creations, updates, insertions, and deletions

DML

Data Manipulation Language--Used to query and update data stored in the tables.

Isolation

Data used during transaction cannot be used by second transaction until the first is completed

NoSQL

Database management systems that handle non-relational data like documents, increasingly used for maintaining and querying the large amounts of data from the Internet. Queries can be made using "Not only Structured Query Language."

very large databases (VLDBs)

Database that contains huge amounts of data—gigabyte, terabyte, and petabyte ranges are not unusual.

ANSI

Defines the standard data types that are supported by RDBMS vendors

Transaction Managment

Definition based on transaction isolation levels. Transaction isolation levels refer to the degree to which transaction data is "protected or isolated" from other concurrent transactions

strong (identifying) relationship

Exists when an entity pulls a primary key from its parent entity and uses it in its primary key as a foreign key.

JOIN ON

Express a join when the tables have no common attribute names Query returns only the rows that meet the indicated join condition

ETL

Extract, Transform, and Load. Used to standardize data across systems, that allow it to be queried.

foreign key (FK)

Foreign keys are the primary keys of other tables placed in the current table to support the link between the two tables.

Structured Data

Formatted Unstructured Data Categorized pieces of information for one instance

database dump

If your database is large or takes too long to pack, dumping a database is the preferred backup method. This method creates a dump (.dmp) file containing only the database metadata, instead of producing a pack file that contains the file system data as well as the metadata.

Chen notation

In ____________________, relationships are represented by a diamond connected to the related entities through a relationship line.

logical design

Logical design is used to translate the conceptual design into the internal model for a selected database management system

schema-less databases,

NoSQL tools create schema-less databases, allowing data structures such as fields to be added

Unstructured Data:

Not well-defined or under a specific format Like emojis, letters, numbers --> Social media

Optional participation

One entity occurrence does not require a corresponding entity occurrence in a particular relationship --> No instances of entity are required

Mandatory participation

One entity occurrence requires a corresponding entity occurrence in a particular relationship --> Atleast 1 instance of an entity

Project

Only views specific columns in a table

Select

Only views specific rows in a table

Physical model

Operates at lowest level of abstraction •Describes the way data are saved on storage media such as magnetic, solid state, or optical media

transactional database

Optimized for running production systems — everything from websites to banks to retail stores. They excel at reading and writing individual rows of data very quickly while maintaining data integrity.

Index

Orderly arrangement to logically access rows in a table

Consistency

Permanence of database's consistent state

show databases

Print a list of all available databases (that could not be accessed on the server as well)

show collections;

Print a list of all collections for current database.

show dbs

Print a list of all databases on the server.

show roles

Print a list of all roles, both user-defined and built-in, for the current database.

show users

Print a list of users for current database.

Data Anomalies

Refers to data inconsistencies caused by redundancy

INNER JOINS

Returns only rows from the tables that match on a common value

OUTER JOIN

Returns the same matched rows as inner join as well as UNMATCHED rows

Rules of Precendence

Rules that establish the order in which computations are completed

JOIN USING

SELECT * FROM T1 JOIN T2 USING (Common Column) returns only the rows with matching values in the columns indicated in the USING clause

attribute domain

Set of allowable values for one or more attributes within a table

Arithmetic Operators

Symbols to compute mathematical processes

schema-less

Systems that can absorb any type of data, structured or not, from any type of source.

UPDATE Statement

The UPDATE statement is used to update existing records in a table. UPDATE table_name SET column1=value1,column2=value2,... WHERE some_column=some_value; Notice the WHERE clause in the SQL UPDATE statement! The WHERE clause specifies which record or records that should be updated. If you omit the WHERE clause, all records will be updated!

Data Node

The data node stores fixed-size data blocks (that could be replicated to other data nodes).

Extensible Markup Language (XML)

The markup language designed to transport and store data on the Web.

Name Node

The name node stores all the metadata about the file system.

Normalization

The process of reducing redundancy in database tables to prevent errors. However, it reduces database performance since it creates lots of tables --> which requires more joins -Refines primary keys

Denormalization

The process of splitting or combining normalized relations into physical tables based on affinity of use of rows and fields.

Disaster Management

The set of DBA activities dedicated to securing data availability following a physical disaster or a database integrity failure.

Atomicity

The transaction property that requires all parts of a transaction to be treated as a single, indivisible, logical unit of work. All parts of a transaction must be completed or the entire transaction is aborted.

batch processing

Transaction processing system (TPS) that processes data in batches at fixed periodic intervals.

constraint

Used to limit the type of data that can go into a table to ensure that the table is accurate

UNION all

Used to produce a relation that retains the duplicate rows

CREATE VIEW command

Views are built using the CREATE VIEW command. Within this command, you specify the base table(s) on which the view is to be based and the attributes and rows of the table(s) that are to be included in the view.

5 Vs of Big Data

Volume: Data comes from different sources Variety: Data comes in different formats Velocity: Data is transferred at optimal speeds Veracity: Data should be accurate Value: Data contains lots of valuable information

Product

Yields all possible pairs of rows from two tables

Difference

Yields all rows in one table that are not found in the other table

Big Data

a broad term for datasets so large or complex that traditional data processing applications are inadequate. Volume, Velocity, Variety, Veracity (accurate), Value

Database

a collection of organized data that allows access, retrieval, and use of data

stored procedure

a computer program stored in the database that is used to enforce business rules

data mart

a data collection, smaller than the data warehouse, that addresses the needs of a particular department or functional area of the business

aggregation function

a function where the values of multiple rows are grouped together to form a single summary value. SUM, MIN, AVG etc

conceptual model

a verbal or graphical explanation for how a system works or is organized

Alias

are used to give a table, or a column in a table, a temporary name

Required attribute:

attribute must have a value and cannot be left empty

Composite attribute:

attribute that can be subdivided to yield additional attributes -->EX: Body Paint --> subframe paint, rear paint

Audit log:

automatically records description of database operations performed by all users

Physical independence

changes in physical model do not affect internal model

Entity Integrity

condition in which each row in the table has its own unique identity Primary key must have a unique value in each row

Metadata

data that describes data

Create Collection / Rename Collection

db.createCollection("newproducts"); db.newproducts.renameCollection("products"); self explanatory

Update and Remove (Mongo DB)

db.products.update({name: "regular desk chair"},{name: "basic desk chair", price: 100, brand: "RoughRider", type: "chair"}); db.products.remove({name: "basic desk chair"}); Update changes a record in a collection whereas remove drops a record in the colleciton

Data Visualization

describes technologies that allow users to see or visualize data to transform information into a business perspective

bucket

equivalent of tables

dirty data

erroneous or flawed data

full backup

exact copy of an entire database

Cardinality

expresses the minimum and maximum number of entity occurrences associated with one occurrence of related entity

Predictive Analytics

extracts information from data and uses it to predict future trends and identify behavioral patterns

Lock:

guarantees exclusive use of a data item to a current transaction

relationship degree

indicates the number of entities or participants associated with a relationship --> unary etc.

Secondary key

key used strictly for data retrieval purposes

business intelligence

leverages software and services to transform data into insights that influences an organization's decisions.

Theta join

links tables using an inequality comparison operator

data model

logical data structures that detail the relationships among data elements using graphics or pictures --> entities and attributes

Inheritance

object inherits methods and attributes of classes above it

high availability

occurs when a system is continuously operational at all times

Authorization management:

procedures to protect database security and integrity: -User access management -View definition -DBMS access control -DBMS usage monitoring

Task tracker:

reduces tasks on a node

hierarchical model

resembles an upside-down tree in which each class has only one parent

find()

retrieves objects from a collection that match the restrictions provided

Mapper Method

takes a collection of data and sorts and filters it into a set of key-value pairs

Data Administrator (DA)

the person in charge of the structure and integrity of the data; manages important aspects of data definition and database design including: data standards, data use, data quality Has higher authority than dataBASE admin.

Database Administrator (DBA)

the person responsible for coordinating, controlling, and managing the database

data mining

the process of analyzing data to extract information not offered by the raw data alone

Database Design

the process of converting a data model into tables, relationships, and data constraints Well-designed database: facilitates data management and generates accurate and valuable information Poorly designed database: causes difficult-to-trace eros that may lead to poor decision making

Ternary relationship

three entities are associated

HAVING clause

to filter the results grouped by a GROUP BY clause. Can only reference aggregate functions or column that are also specified in the GROUP BY clause.

•Binary relationship

two entities are associated

CREATE INDEX statement

used to define a clustered or non clustered index.

DISTINCT

used to return only distinct (different) values a column often contains many duplicate values; and sometimes you only want to list the different (distinct) values.

Wildcard Character

used to search for words that the user is not sure how to spell or for word variations

EXISTS

used to test for the existence of any record in a subquery. The operator returns TRUE if the subquery returns one or more records.

relational model

uses a two-dimensional table of rows and columns of data. Rows are records (also called tuples), and columns are fields (also referred to as attributes).

functional dependence

value of one or more attributes determines the value of one or more other attributes

ROLLUP Extension

•Used with GROUP BY clause to generate aggregates by different dimensions •Enables subtotal for each column listed except for the last one, which gets a grand total

Simple attribute:

Attribute that cannot be subdivided • e.g.) Age, gender, marital status

Collections (MongoDB)

Can be thought of as the relation (table) in DBMS, but with many differences.

Online Transaction Processing (OLTP)

Capturing of transaction and event information using technology to process it in conformity to business rules, storing it, and update current information.

ORDER BY

Clause used to sort the output of a SELECT statement

Union

Combines all rows from two tables, excluding duplicate rows

UNION

Combines rows from two or more queries without including duplicate rows query UNION query

Concurrency Control

Coordination of the simultaneous transactions executed in a multiuser database system Objective: ensures serializability of transactions in a multiuser database environment

Durability

Ensures that once transactions are committed they cannot be undone or lost

Referential Integrity

Every reference to an entity instance by another entity instance is valid --> referred entity must exist

Weak (non-identifying) relationship

Exists when an entity does not bear any primary keys from its parent entity.

GROUP BY

Groups rows that have the same values into summary rows, like "find the number of customers in each country" often used with aggregate functions to group the result-set by one

New Technologies of Big Data

Hadoop Hadoop Distributed File System MapReduce, NoSQL

Embedded SQL

Hard-coded SQL statements included in a program written in another language, such as C or Java.

Intersect

Identifies what rows are identical in both tables.

Two Main characteristics of Data

Ubiquitous (Global, everywhere) Pervasive (Unescapable, persistent)

CUBE extension

Used with GROUP BY clause to generate aggregates by the listed columns Enables you to get a subtotal for each column listed in the expression, in addition to a grand total for the last column listed

Divide

Uses one double-column table as the dividend and one single-column table as the divisor Output is a single column that contains all values from the second column of the dividend that are associated with every row in the divisor

Date Aritmetic

Values are stored as a number of days;


Ensembles d'études connexes

Chapter 18 Building Careers and Writing Résumés

View Set

Health Assessment in Nursing, Chapter 3

View Set

Nursing Process Chapter 20: Communication

View Set

Microbiology 240 - Exam 1 Multiple Choice

View Set