CSE 3241 Final Exam

Ace your homework & exams now with Quizwiz!

How can we test for conflict serializability

1. For each transaction Ti participating in schedule S, create a node labelled Ti in the precedence graph 2. For each case in S where Tj executes a read_item(X) after Ti executes a write_item(X), create an edge (Ti --> Tj) in the precedence graph 3. For each case in S where Tj executes a write_item(X) after Ti executes a read_item(X), create an edge (Ti --> Tj) in the precedence graph 4. For each case in S where Tj executes a write_item(X) after Ti executes a write_item(X), create an edge (Ti --> Tj) in the precedence graph. We have a serializable schedule if the precedence graph has no cycles

When are two operations in a schedule in conflict?

1. They belong to different transactions 2. They operate on the same database item 3. At least one of the two operations is a write operation These three rules must be met to have a schedule conflict!

Causes of transaction failure?

1. Unfortunate events - ◦ Computer failure (crash) ◦ Transaction/System Errors (overflow/underflow) ◦ Logical/exception Errors E.g., insufficient funds in a banking transaction ◦ Disk failure

How do we specify a DTD XML schema for an XML datafile?

<!DOCTYPE Projects SYSTEM "proj.dtd"> This says that the following !DOCTYPE - keyword Projects - Name of our DTD's root node SYSTEM - this is an external DTD proj.dtd - filename or URL of the schema

When working with DTD what do + * and ? mean?

<!ELEMENT Projects (Project+)> Means we can have 1 or more asterisk - 0 or more ? - 0 or 1

Concurrency control

A DBMS feature that coordinates the simultaneous execution of transactions in a multiprocessing database system while preserving data integrity. Should be familiar with idea of cuncurrency, just know and recognize that DB transaction manager provides concurrency control so we can safely execute multiple transactions while maintaining data integrity

Distributed database?

A logically related database that is stored in two or more physically independent sites ie data is distributed across different sites, all the sites are connected together via a communication network

partial order

A relation R on a set A is a partial order if it is reflexive, transitive, and anti-symmetric Review CSE 2321 for what these terms mean

What does it mean to be conflict serializable?

A schedule is considered conflict serializable if it is conflict equivalent to some serial schedule

S_a, S_b, etc mean?

A schedule of operations that occur in a database for example: S_a: r_1(X); r_2(X); w_1(X); r_1(Y); w_2(X); w_1(Y)

Serial schedule What about non-serial schedule?

A schedule where the operations of each transaction are executed consecutively without any interleaved operations from other transactions. IE no interleaving of commands /actions of different transactions Given T_1 and T_2, either T_1 goes first or T_2 goes first Non-serial the opposite, we can interleave instructions from different transactions We want non-serial because serial schedule wastes CPU time while waiting on memory or the HDD

Cascading Rollback

A single transaction failure leads to a series of transaction rollbacks.

ROLLBACK TRANSACTION does what?

Abort the transaction - undo all statements from the BEGIN TRANSACTION statement

What does COMMIT TRANSACTION do?

All of the updates can now be "committed"/made permanent/persistent

Once a transaction is in the committed state what does that mean?

All read/write operations have been permanent, if a DB failure occurs, the database will recover to this point for the data that was modified

Transaction Properties? Hint recall: ACID

Atomic - transaction either completely finishes or is no performed, no half-done transactions Consistent - Database starts and end transaction in consistent state Isolated - Transaction should always function as if they were executing in isolation from other transactions Durable - Database changes made by a completed transaction must persist even in database fails

In SQL database we waited for ACID, what is used as a replacement in NoSQL?

BASE Basically Available Soft State Eventually consistent

SQL transaction syntax, all transactions start with? end with?

BEGIN END

What does a transaction look like in SQL?

BEGIN TRANSACTION NEW_SP INSERT INTO SP VALUES ('S5', 'P1', 1000); IF error THEN GO TO UNDO; END IF; UPDATE P SET TOTQTY = TOTQTY + 1000 WHERE P#='P1'; IF error THEN GO TO UNDO; END IF; COMMIT; GO TO FINISH; UNDO: ROLLBACK; FINISH: END TRANSACTION; Indentation is a bit messed up but not how SQL has the GOTO operator like in C. This allows us to perform error control, rollback if error, otherwise commit and end the transaction

What is a Schema used for XML documents

Communicate semantic information about an XML document What do each of these elements mean? ◦ Defines: Names for expected XML tags/elements Required/optional elements Nested elements Defines entire structure, root element, children elements, attributes

What is a complex element in XML?

Complex elements hold other elements

What is the transaction manger?

Component of DB, it execute DB requests from one or more users and protects the data from loss or damage

How is a DBMS able to restore the database to the last commit point in the case that a transaction fails?

DBMS keeps a system log (or journal) of transactions The journal is append-only file, all transactions are appended to the end of the log file The journal is on disk, so only a disk failure will prevent recovery of the database

semi-structured data

Data has some structure, but it is more loosely defined No pre-defined schema structure Not all entities of the same type have the same attributes New attributes can be added to entities in an ad hoc manner

Structured data?

Data that is represented in a strict format Databases hold structured data

What are the three types of XML documents?

Data-centric - highly structured, many small data items, used for data exchange, used to create dynamic web pages, folows a schema document Document-Centric - few structural elements, large amount of text (articles, blog, books, etc), may have schema document but not required Hybrid XML - some parts are highly structured, some parts mostly, blocks of text and/or unstructured, may or may no have predefined schema

DTD

Document Type Definition

XSLT is used to?

Extensible Stylesheet Language Transformations ◦ Language used to transform XML documents into other documents HTML, plain text, Formatting Objects, XML Xpath built into XSLT to allow for powerful queries Useful as reporting and data exchange mechanisms

How can we characterize transaction schedules?

How easy is it to recover from a failure? simple? possible - not simple? impossible? Our goal is to have simple recovery

Result Equivalence

If two schedules produce the same result after execution, they are said to be result equivalent, not a great way to test use conflict equivalence instead

Serializable schedule

In transaction management, a schedule of operations in which the interleaved execution of the transactions yields the same result as if they were executed in serial order.

CAP theorem?

It is impossible in a distributed system to simultaneously provide consistency, availability and partition tolerance One of the three must be sacrificed

When is a XML document considered valid?

It is well-formed ◦ It follows a particular schema in a standard definition language: A DTD document (Document Type Definition) OR An XML schema document

Issues with DTD?

Limited data type support and data validation! Doesn't use standard XML syntax either

What is a transaction?

Logical unit of DB processing, ie a set of operations you perform on a DB, this includes one or more access operations, which could be reads, writes, deletes, etc

How can we use XML to repalce the DTD document

Look At slides 60-65 of XML presentation, for example, it is more complicated than DTD, as it requires nesting tags upon tags to get the right structure Basic idea: We declare elements as <xsd:element name="element_name"> we can specify that data type with attribute if a simple element if complex element we place the following tag inside the element <xsd:complexType> This tag has a <xsd:sequence> tag nested inside, which contains a list of element. (see how to create element above

Issues with concurrency control?

Lost update - Two processes update same data incorrect summary - Summary process at the same time as update temporary update (dirty read) - Write operation fails, requires undo, ie basically one transaction performs a write, but fails later down the road, another transaction reads the data they wrote, which is now being undone, so both transactions must be undone

What are the issues with replication and sharding?

Networking latency and too much replication puts a load on the masters Sharding needs queries to be handled by an extra layer You must try to design a system with as much less writes as possible thus making it easer to scale

Does a schedule being serializable, mean it is serial?

No just that the schedule is correct and it leaves the database in a consistent state

Unstructured data

No structure to data, for example, a book, it isn't defined where I wil find the author's name, title, chapter, etc I have to read the doc and find out

Can we use an XML parser for DTD?

No, DTD has its own syntax, need a special parser

Is this recoverable? Sc: r1(X); w1(X); r2(X); r1(Y); w2(X); c2; a1 If not how can we fix it?

No, r2(X) occurs after w1(X), meaning we have to revert the changes that T_2 made as well This is one way to fix: Sd: r1(X); w1(X); r2(X); r1(Y); w2(X); w1(Y); c1; c2; Note this is recoverable, since we must commit T_1 before T_2, if failure, then we can undo both transactions If we need to abort T_2, No reads by T_1 would be affected by reverting/aborting T_2

What is replication characteristic of NoSQL? What about sharding?

Replication creates additional copies of the data and allows for automatic failover to another node. Sharding allows for horizontal scaling of data writes by partitioning data across multiple servers using a shard key

Why NoSQL?

SQL not sufficient for Big Data, we need a better method

How can we make this schedule cascadeless? Sd: r1(X); w1(X); r2(X); r1(Y); w2(X); w1(Y); c1; c2;

Se: r1(X); w1(X); r1(Y); w1(Y); c1; r2(X); w2(X); c2; We delay T_2's read of X until T_1 is committed, means we don't need to rollback changes made by T_2 in the event of T_1 abort

Schemaless XML documents

Semi-structured documents without a predefined schema Denoted by the attribute 'standalone="yes"' in the XML declaration on the top line

In XML what is a simple element

Simple elements hold data values within them

Wide-column or Wide-row DB NoSQL?

Store data in Columns Offers high performance

Document DB how does it work?

Store data in Key-Value (Document) form • Designed for storing, retrieving, and managing document-oriented information, oftentimes stored as JSON

Key-Value DB how does it work?

Store: data consists of an indexed key and a value • Users perform some query operations via a key access

What does the following line mean? <!ATTLIST Project number ID #REQUIRED>

The Element Project has the attribute called "number", which is required The type of "number" is a unique ID, can be used to refer to this Project sorta like a primary key

What does this mean? <!ELEMENT Name (#PCDATA)>

The leaf tag called Name will contain PCDATA which stands for parsed character data data could be empty but there is some PCDATA between start and end tags

r_1(X) means? w_2(Y) means? what does b_1 mean? What does c_2 mean? a_1 means?

Transaction 1 ie T_1 performs a read on the data item X Transaction 2 ie T_2 performs a write on data item Y b_1 - begin transaction T_1 c_2 - commit transaction T_2 a_1 abort transaction T_1

Conflict Equivalent

Two schedules are said to be conflict equivalent if the order of any two conflicting operations is the same in both schedules

Attributes of XML tag should be used for what purpose?

Use attributes for information that describes/modifies the element

What is DTD (Document Type Definition)

Used to define the schema of an XML document, specifies: Each possible element in the document is defined What children must it have? What children can it (optionally) have? What kinds of attributes can/must it have? If it is a leaf element, what kinds of values can it have?

What is a complete schedule?

We call a schedule a complete schedule if: 1. The operations in the schedule are exactly those of the transactions that make up the schedule 2. For any pair of operations within the same transaction, their order in the schedule is the SAME as their order in the transaction 3. For any pair of conflicting operations, one of the operations must occur before the other in the schedule

What are the three common languages used to query data in XML documents?

XPath XQuery XSLT

What is XPath?

XPath is a simple query language used to select parts of an XML document XPath queries are written as paths through an XML document ◦ Nodes are separated by '/' characters ◦ The result returned is whatever is at the end of the XPath expression Examples: /companyDB/employees/employee/@supervisor/companyDB/employees/employee/*/companyDB/employees/employee[starts-with(lname,"S")]

What is XQuery?

XQuery is an extension of XPath ◦ Uses the same data model as XPath Supports join operations supports aggregate functions support for biconditional branching if - then Example: Query 2: Get employee names in the "Research" department. let $d:=doc("/Users/raj/company.xml") let $r:=$d/companyDB/departments/department[dname="Research"] for $e in $d/companyDB/employees/employee where $e/@worksFor=$r/@dno return {$e/lname}{$e/fname}

Is this recoverable? Sa: r1(X); r2(X); w1(X); r1(Y); w2(X); w1(Y); c1; c2; If not how can we fix it?

Yes, if we rollback T_1, result of T_2 is unaffected and T_2's operation can be left in place

What does the REDO SQL command do?

certain operations must be redone to ensure that all operations of a committed transaction have been applied successfully to the DB

What is big data?

characterized by data with huge: volume variety velocity Too large for SQL

Internal nodes of an XML tree are called?

complex elements

Cascadeless rollbacks

every transaction in the schedule reads only items that were written by transactions that have committed

Data Mining/Warehousing

gather streams of data, consolidating them in a cluster that feeds a Data warehouse for analyses and prediction

XML stores data in a __________ model

hierarchical model aka tree model

Why do we use XML when working with databases?

it provides a data exchange framework Moving data from one application to another, from one database to another Taking data from a database and turning it into a website, a report, or other human readable document

Storing data in XML is much like how we use HTML in that the element/tag contents show the data to be displayed, while the attributes should ____________ how the data is displayed

modify/describe

A schedule is a _______ order of the operations of a set of transactions

partial

Once all read/write operation occurs in a transaction, the transaction moves from the "active" state to what state?

partially committed

In the event of transaction failure what is the job of the transaction manager?

protect the data from loss or damage, tyipcally this means revert to consistent state before the transaction started

Recoverable schedule vs nonrecoverable?

recoverable - means that once we commit a transaction we should never have to rollback that transaction nonrecoverable - the opposite, means even if transaction is commited we might need to rollback - we don't want this

Once a transaction T is committed, it should never be necessary to ___________

rollback

In NoSQL we look for what characteristics in a solution?

scalability Availability, replication, and eventual consistency Sharding of files • High performance data access • Schema not required • Less powerful query languages

leaves of XML tree are called?

simple elements

Graph DB

store data in nodes/vertices (ex. Google Maps). relationships persisted in data store in the form of edges we just have to fetch them. No need to run any sort of computation at query time. leads to low latency Designed for highly complex and connected data, which outpaces the relationship and JOIN capabilities of an RDBMS • Exceptionally good at finding commonalities and anomalies among large data sets

What are the three V's or requirements for distributed databases/storage systems?

volume variety velocity


Related study sets

Organizational and Professional Health and Well Being

View Set

Unit 13 Lesson 3; 07 Evaluate: Graded Quiz

View Set

Geology 101: Chapter 3: Earth's Interior

View Set

UCONN Physics 1 & 2 (Scanlon) Chapter 23 Notes: Circuits

View Set

Nutrition Chapter 6 Fats & Other Lipids

View Set