Database Management Exam 3

¡Supera tus tareas y exámenes ahora con Quizwiz!

What is Key Performance Indicators?

- Quantifiable measurements that asses a company's effectiveness in reaching goals

What is concurrency control?

- The process of managing simultaneous operations on the database without having them interfere with one another. It prevents two or more users from accessing the database simultaneously

Create a trigger example that when someone update a supplier ID of the supplier table, the DB will also update the corresponding supplier ID in the Items table, where it is a foreign key.

CREATE TRIGGER update_supplierID AFTER UPDATE ON Supplier FOR EACH ROW BEGIN UPDATE item SET item.SupplierID = NEW.Supplier_ID WHERE item.Supplier_ID = OLD.SupplierID; END

What are SQL injections?

Injecting SQL code via data that is sent to a DBMS, when (non-parameterized) dynamic SQL is used. This code is used to either retrieve extra data or delete data sets

What are the 4 "Big data" factors impacting traditional BI databases? What does all of this lead to?

- Semi-structured and unstructured data, for ex: mobile apps cannot be efficiently represented into rows and columns in a RDBMS - Rapid evolution of database scheme - High velocity data sources (it can take to much time) - Quickly growing data volumes All of this leads to a growing use of NoSQL

What are 2 steps to monitor a business' health?

- Setting goals - Using Key Performance Indicators (KPIs)

What are the 4 characteristics of a data warehouse?

- Integrated - consistent format and meaning - Subject-oriented - Organized and summarized to answer questions - Time Variant - captures and represents the flow of data over time - Nonvolatile - Once data enters the warehouse, it is never removed

Describe the k-Nearest Neighbor Classification approach

- Looks at characteristics / attributes. A if it walks and sounds like a duck, then its probably a duck approach. The training set are the existing animals in the pond

What is a consistent database? What does a successful transaction do?

- One in which all data integrity constraints are satisfied after a database transaction -Changes one database from one consistent state to another

What is an exclusive lock/write lock? What is a shared lock/read lock

- Prohibits other users from reading the locked resource - Allows users to read the locked resource, but does not allow them to update it

What are the 3 things the Master Data Management does(MDM)?

- Provides for a comprehensive and consistent definition of all data in an organization -Ensures uniform and consistent views of all data -Supports proper governance for controlling and monitoring business health, and creates accountability

What is lock granularity? What are the ranges?

- Refers to the size of the locked resources -Data base level (slow data access) to table level to page level (standard) to row-level (significant overhead)

Why do we use persistent stored modules? What are they?

- SQL itself does not support control statements such as looping operations, so to support this, the SQL 99 standard defined the use of persistent stored modules - These are blocks of code (SQL and procedures) that are stored and executed at the DBMS

Where are 3 places parameterized queries are used

1. Used in web based DSS 2. Used in desktop application (.NET -> MS Access) 3. Used within the actual DBMS

What is a transaction log? What the 3 things does it stores?

- A DBMS maintains a transaction log to support recovery to a consistent state - Stores: 1. A record for the beginning of the transaction 2. A record for each transaction completed (type of operation, tables/attr affected) 3. A record for the end of the transaction (COMMIT)

What is a Data Warehouse? What is its prime objective? What are three resources needed to create a data warehouse? How long does it take to implement a data warehouse?

- A database optimized for data analysis and read-only query processing - To provide fast and accurate answers to data analysis questions - Time, money, and managerial effort - 1 to 3 years

Define the Online Analytical Processing (OLAP)? Where is it used?

- A name for a set of tools and techniques used to analyze multi-dimensional data in a data warehouse

What are, and explain, the 4 properties that a transaction must have:

-Atomicity- All or nothing property, meaning all transaction operations must be completed not part of it -Consistency - When a transaction is completed, the database must be in a consistent state -Isolation - Data used during execution of a transaction cannot be used by a second transaction until the first one is complete -Durability - Once transaction changes are committed, they cannot be undone or lost due to subsequent failure

What are the 4 characteristics of using triggers?

-It is invoked either before or after a data row is inserted, updated, or deleted -It is associated with a database table, each table may have one or more trigger, but each trigger must have a unique name -It is used to automate critical actions / provide warnings -It can be used to enforce constraints, such that of referential integrity

What are the 3 potential problems caused by concurrency? (explain them)

-Lost updates: Occurs when a successfully completed update is overwritten by another transaction (think of X withdrawing $10 from an account at the same time Y deposits $100) -Uncommitted data- Occurs when one transaction accesses the intermediate results of another transaction before they are committed- and the second one is rolled back -Inconsistent retrievals- When a transaction reads several values, but a different transaction updates some of them in the midst of this process, some data is read before they are changed and others after, resulting in inconsistent results

What is Data Mining? (2 things)

-Non-trivial extraction of implicit, previously unknown and potentially useful information from data, also involves the exploration and analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns

What are the 6 advantages of stored procedures?

-Performance - compile once and are quick/efficient -Server-side computation - decreases network traffic -Executable code is cached and shared- lowers memory requirements -Grouping SQL statements allow for a single call execution -Productivity and Ease of Use - avoids redundant coding, and can be called by any app -Security - can limit/restrict users

What are, and explain, 3 examples of supervised algorithms?

1. Direct Marketing - Reduce the cose of mailing, by targeting the mails to specific consumers. By finding out information like demographics, lifestyles about consumers and which of those happen to buy the product 2. Fraud Detection- Use credit cards transactions, by finding out information like when does the customer buy, what do they buy, etc. 3. Customer Attrition/Churn - Predict whether a customer is likely to be lost to a competitor, by finding out previous transactions, etc.

What are the 3 things that become hard to do with no parameterized queries?

1. Effective decision support systems require flexibility when interacting with user 2. Decision makers may not always know what questions they want to ask 3. Pre-determining a list of potential queries can be very limiting

How is a transaction log implemented? When is it updated? What are the tradeoffs?

1. It is typically implemented as one or more files that are stored separately from the database itself 2. It is automatically updated when the DBMS executes transactions that modify the database 3. Increases DBMS processing power but provides the ability to restore a corrupted database

What are the 2 approaches for creating a serializable schedule in concurrent transactions?

1. Locking methods 2. Timestamping

What are two examples of Clustering on Unsupervised Algorithms

1. Market Segmentation- we want to subdivide a market into distinct subsets of customers where any may conceivable be selected as a market target to be reached with a distinct marketing mix, the approach will be to collect different attributes of customers based on their geographical lifestyle 2. Document Clustering- Find groups of documents that are similar to each other based on the important terms appearing in them, this can be done be identifying frequently used terms. Twitter uses this to recognize different disasters in the world

What are the 3 placed in which stored procedures can be called?

1. Programs written in standard languages (Java, C#) 2. Scripting languages (JScript, VBScript) 3. SQL command prompt (SQL*Plus)

What are Supervised Algorithms (Classification) (What kind of algorithm is it known as) (What do we want to predict)

It is a learning by example algorithm. Training data is first gathered which contains the correct answers (class label attributes), using the training data, an algorithm is created to try and identify a new class label for incoming data. We want to build an algorithm that can predict the dependent class attribute for a brand new observation, simply based on the values of its other attirbutes

If a DBMS supports transaction management....

It will roll back an inconsistent database (meaning an unsuccessful transaction) to a previous consistent state, it is basically resetting the database to where it was on a consistent state

What is a trigger? What is it also known as?

- A procedure that is automatically executed by the RDBMS when a given data manipulation event occurs (event handler)

What is a serializable schedule ?

- A schedule of a transaction's operations in which the interleaved execution of all active transactions yields the same results as if those transactions were executed in serial order, it basically interleaves operations when possible

Define Business Intelligence When was the term coined and by who?

- A set of methodologies, processes, architectures, and technologies that transform raw data into meaningful and useful information - Term coined by the Gartner Group in the early 1990s

What is a Data Mart? What are 2 advantages?

- A small, single subject data warehouse subset that provides decision support to a small group of people (like a finance department) - Less organizational commitment required and shorter implementation time (6 months to a 1 year)

What is a stored procedure? What can it receive, and where is it running on?

- A subroutine available to applications accessing a relational database system, sometimes called a sprc or an SP is actually in the database -Can receive input parameters and may return results, and its running WITHIN the database rather than within some outside program environment -Procedure can be ran from a variety of different program languages

What is a database transaction

- Any (possibly multi-step) action that reads from and/or writes a database - It may consists of a single SQL statement or a collection of relation SQL statements

What are Strict 2-Phase Locking (S2PL) What are Strong Strict 2-Phase Locking (SS2PL)

- Are not allowed to release write locks until the very end of the process - Standard technique used here, where both read and write locks can only be released when everything has been completed, making things more limited but provides the most protection

What is optimistic locking? What is pessimistic locking?

- Assumes that no inconsistent transaction(s) will occur. The DBMS processes a transaction to a temporary file, and checks whether conflicts occurred. If so, the transaction is repeated until there is not conflict - Assumes that conflicts will occurs, so locks are issued before a transaction is processed, and then locks are released

What are the 4 Business Intelligence Components?

- Data extraction, transformation, and loading tools - Data storage (data warehouse or data mart) - Data query and analysis tools (OLAP) - Data presentation and visualization tools (Dashboard)

What are Multi-Dimensional DBMS What are 2 advantages?

- Data that is stored in multi-dimensional arrays, visualized as a data cube - Data retrieval is much quicker, provides opportunity to slice and dice

The traditional techniques for coming up with ideas for data mining may be unsuitable due to what 3 reasons?

- Enormity of Data - High Dimensionality of the data - Heterogeneous, distributed nature of data

What are, and explain, the 2 main structures of a Multi-Dimensional DBMS What kind of relationship is going on here?

- Fact Tables - associated with a particular type of aggregated data (A Sales tables dealing with the total amounts) - Dimensional Tables - Attributes that provide descriptive information about the facts within a given dimension (product, time, location, etc.) - There is a one to many relationship from the dimensional tables to the fact tables

Describe what Clustering in on Unsupervised Algorithms

- Given a set of data points, each having a set of attributes, and similarity measure among them, find clusters that data points in one cluster are more similar to one another and data points in separate clusters are less similar to one another. Mostly uses the Euclidean Measures

What are the 6 characteristics of an Online Analytical Processing (OLAP)

- Graphical User Interface - Analytical Processing Logic - Data-processing logic - Advanced functionality for data presentation, aggregation, and modeling - Capacity for multi-dimensional analysis - Used with transactional DB and data warehouses

What is two-phase locking (2PL) (talk about the phases) What does it gaurantee?

- Guarantees serializability by allowing transactions to obtain as many locks as necessary (considered the growing phase), and once the first lock is released (considered the shrinking phase), there are no additional locks obtained

What is the idea of using locking methods? What will this result in?

- In order to prevent a transaction from interfering from another, the first transaction can lock in all the resources it needs and unlock them when it is done, this will result in a longer wait for things to get done

What is Operational Data (Transactional Databases) (what is the focus) What is Decision Support Data (data warehouses) (what do they result in)

- Stored in highly normalized tables in a relational data base, where they are dynamically updated, and focus is of traditional information systems -Stored in formats that facilitate data extractions, data analysis, and decision making, these are often aggregated and result in redundancies

What are the 2 types of data mining algorithms?

- Supervised Algorithms - Unsupervised Algorithms

What are the 3 things needed for the Nearest Neighbor Classifier to work?

- The set of stored records with their associated classifications already known - Distance metric to compute the distance between records - The value of k (the measure), the number of nearest neighbors to retrieve. if k is too small, then the model is sensitive to noise and if its to large, neighborhood may include too many points from other classes

What are deadlocks? What does not prevent it?

- When two transactions effectively go into an infinite loop where each is waiting for the other to release a resource - Regular 2-Phase locking does not prevent this

What are the 3 advantages of using parameterized queries?

-Queries can be compiled once at the beginning of the process, parameter values can then be set as needed -Data types can be enforced without focus on syntax -Security, it can help prevent SQL injections

What is timestamping? How are they ordered? How are conflicts resolved?

1. A unique identifier created by the DBMS that indicates the relative starting time of a transaction. Transactions are ordered globally so that older transactions (transactions with small timestamps) get priority in the event of a conflict. Conflicts are resolved by rolling back and restarting the associated transaction. It does demand a lot of system resources like memory and process overhead

What are, and explain, the two ANSI SQL standards for managing transactions?

1. COMMIT- finalizes all of the changes within the database and ends the transaction, permanently records all changes in the database and automatically ends the transaction 2. ROLLBACK - aborts all uncommitted changes, rolling back the DB to its previous state. The rolled back transaction can be started later on

What are the differences between SQL and NoSQL when it comes to: 1. Data Integrity 2. Transactions 3. Performance 4. Scailing

1. Data Integrity is already an issue with NoSQL, as there are a lot of anomalies (repeated data), so there needs to be some overhead to validate all of the data 2. You do not have transactions in NoSQL 3. Performance is much quicker to retrieve information in NoSQL 4. It is easier to add bigger data chunks in NoSQL rather than in SQL

What are, and explain, 3 basic techniques that help control deadlocks?

1. Deadlock prevention- aborts a transaction if there is a possibility of a deadlock, then reschedules transaction for later execution 2. Deadlock detection- DBMS periodically tests database for deadlocks, if found, the transaction is rolled back 3. Deadlock avoidance- transactions obtain all needed locks before execution

What are the differences between SQL and NoSQL when it comes to: 1. Tables 2. Schemas 3. Normalization 4. JOINs

1. NoSQL does not store data in tables (JavaScript Object Notation Files JSON stores in text files) 2. SQL requires schemas but with NoSQL we use a more flexible way of creating a schema 3. Normalization does not typically happen in NoSQL like in SQL 4. You do not need to use JOINs with NoSQL as everything you would want to JOIN is already together

What are the 6 assumptions of traditional Business Intelligence platform

1. Predictable Frequency - Data is extracted from source systems at regular intervals (days, months, etc.) 2. Static Sources- Data is sourced from controlled, internal systems supporting established and well-defined back-office processes, we know where data is coming from 3. Fixed models- Data structures are known and modeled in advance of analysis 4. Defined Queries - questions to be asked of the data 5. Slow changing requirements - rigorous change control is enforced before the introduction of the new data sources or reporting requirements 6. limited users - the consumers of BI reports are for business managers

What are the internal programming language extension that SQL Server and Oracle have created to create procedures

1. SQL Server: Transact SQL 2. Oracle: Procedural SQL (PL/SQL)


Conjuntos de estudio relacionados

Soil and Agriculture- Environmental Science

View Set

Chapter 6 : critical thinking and reasoning

View Set