Database Final
How many duplicate values are allowed in attribute which is a primary key?
0
XML can have this many root elements
1
max number of clustered indexes per table is...
1
Common HDD platter rotation speed is
5400, 7200, 10000 or 15000 RPM
Create View SQL creates
A virtual table without copying data
RDBMSs support the following property
ACID
INNER JOINs are used when we want to keep
All matching tuples of both tables
FULL JOINs are used when we want to keep
All tuples from both tables, regardless of matching tuples
LEFT JOINs are used when we want to keep
All tuples of table on the LEFT, regardless of matching tuples
RIGHT JOINs are used when we want to keep
All tuples of table on the RIGHT, regardless of matching tuples
ACID stands for
Atomicity, Consistency, Isolation, Durability
What is the correct ordering of record, disk block, attribute value?
Attribute value < record < disk block
NoSQL supports the following property
BASE
NoSQL BASE stands for
Basically available, soft state, eventual consistency
____ specify structure of XML document
Both DTD and XSD
SQL consists of
DDL, DML, DCL and TCL
What concept specifies the various data items, their characteristics, and relationships, constraints, storage details
Database model
DTD stands for
Document Type Definition
XML stands for
Extensible Markup Language
XSLT stands for
Extensible stylesheet language transformation
ER Diagrams are used for archiving purposes only (T/F)
False
Which of the following statements is correct? a. DataNodes in HDFS store a registry of metadata. b. The HDFS NameNode sends regular heartbeat messages to its DataNodes. c. HDFS is composed of a NameNode, DataNodes, and an optional SecondaryNameNode. d. Both the SecondaryNameNode and primary NameNode can simultaneously handle requests from clients.
HDFS is composed of a NameNode, DataNodes, and an optional SecondaryNameNode.
In SQL, subqueries may be used
In SELECT, FROM and WHERE
T1 calculates a summary of DB items. Other transactinos are updating some of these items. This is a
Incorrect summary
System database
Is NOT (none above) physical structure determined by DB designer, contain data that represents state of business process, or the logical struct determined by db designer
When an SQL view is created
It exists as a stored SQL query and can be queried
In practice, these OUTER joins are used most frequently
LEFT JOIN
XPath is
Language that refers to parts of an XML document
When item x has incorrect value because its update by T1 is overwritten by T2, this is a
Lost update
DBMS buffers are located in
Main Memory
Is NOT NULL constraint required for a column which is a PRIMARY KEY?
No
Does primary key allow NULLs?
No / False
NoSQL stands for
Not Only SQL
User database
Physical structure determined by db designer, contain data that represents the state of business process, and logical structure designed by DB designer (all above)
RDBMS stands for
Relational Database Management System
If sorted by speed, what is the correct order?
SSD > RAM > CPU Cache > CPU Registers
In the relational model
Specialization, Categorization, Aggregation is not supported (all above)
Attribute which a foreign key may contain NULLs (T/F)
True
ER Diagrams commonly use Chen or Crows foot may also be in UML (T/F)
True
Foreign key must be of the same data type as PK which FK refers to (T/F)
True
SQL is pretty much the same for all relational databases
True
T1 reads x twice, T2 changes X's value between reads, this is a
Unrepeatable read
XSD stands for
XML Schema Definition
Every logical table is assigned to
a tablespace
Which of the following statements is correct? a. HBase can be considered as a NoSQL database. b. HBase offers an SQL engine to query its data. c. MapReduce programs cannot be used with HBase. Data are accessed using simple put and get commands instead. d. HBase works well on large clusters as well as small ones having a few nodes
a. HBase can be considered as a NoSQL database.
________ consist of 2 or more adjacent sectors
allocation unit, cluster (page) and disk block (all above)
When using subqueries, nesting is supported
any level of nesting is fine
In HDD a cylinder is
any set of all tracks of equal diameter
In the embedded identification file organization
attributes are always preceded by attribute types
Which statement is correct? a. A star schema has one large central dimension table which is connected to various smaller fact tables. b. A star schema has one or more large fact tables which are connected to any number of smaller dimension tables. c. To speed up report generation and avoid time-consuming joins in a star schema, the dimension tables need to be normalized. d. The dimension tables in a star schema are frequently updated.
b. A star schema has one or more large fact tables which are connected to any number of smaller dimension tables.
In terms of data manipulation, a data warehouse focuses on... a. Insert/Update/Delete/Select statements. b. Insert/Select statements. c. Select/Update statements. d. Delete statements.
b. Insert/Select statements.
Which statement is not correct? a. A transaction is a set of database operations (e.g., a consecution of SQL statements in a relational database), induced by a single user or application, that should be considered as one undividable unit of work. b. Transactions typically exist in isolation, and cannot be executed concurrently with other transactions on the same data. c. It should not be possible to terminate a transaction in such a way that the database remains in an inconsistent state, because some operations of a single transaction were executed successfully and others were not. d. Recovery is the activity of ensuring that, whichever problem occurred, the database is returned to a consistent state afterwards, without any data loss.
b. Transactions typically exist in isolation, and cannot be executed concurrently with other transactions on the same data.
Pig is... a. a programming language that can be used to query HDFS data. b. a project offering a programming language to provide more user-friendliness compared to MapReduce programs. c. a database that runs on Hadoop. d. an SQL engine that runs on top of Hadoop
b. a project offering a programming language to provide more user-friendliness compared to MapReduce programs.
14.2. When a transaction is aborted, it is important that... a. all changes made by the individual operations belonging to that transaction should be made permanent. b. a rollback of the transaction is executed: all changes made by the transaction's respective operations should be undone
b. a rollback of the transaction is executed: all changes made by the transaction's respective operations should be undone
Which statement about ETL is not correct? a. Some estimates state that the ETL step can consume up to 80% of all efforts needed to set up a data warehouse. b. To decrease the burden on both the operational systems and the data warehouse itself, it is recommended to start the ETL process by dumping the data in a staging area where all the ETL activities can be executed. c. During the loading step, the data warehouse is populated by filling the fact and dimension tables, thereby also generating the necessary surrogate keys to link it all up. Fact rows should be inserted/ updated before the dimension rows. d. The extraction strategy can be either full or incremental. In the latter case, only the changes since the previous extraction are considered.
c. During the loading step, the data warehouse is populated by filling the fact and dimension tables, thereby also generating the necessary surrogate keys to link it all up. Fact rows should be inserted/ updated before the dimension rows.
Which of the following statements is not correct? a. A mapper in Hadoop maps each element in a collection to one or more output elements. b. A reducer in Hadoop reduces a collection of elements to one or more output elements. c. Reducer workers in Hadoop will start once all mapper workers have finished. d. A MapReduce pipeline in Hadoop can include an option
c. Reducer workers in Hadoop will start once all mapper workers have finished.
Configuration and tuning of DB parameters...
can be done by a DBA, autoconfigured by a DBA, and is a DBA's responsibility (all above)
Transactions...
can be either read-only, read-write, and can have begin and end transaction statements (all above)
In XML, tag sequences
cannot be overlapped
ER Diagrams are used for
communication with a client
the limitation of a relational model is
complex objects are difficult to handle
In NoSQL BASE property, Eventual consistency is
consistency at some point after transaction
Pointers and lists file organization
contains references to overflow locations
Which of the following is not one of the reasons why Spark programs are generally faster than MapReduce operations? a. Because Spark tries to keep its RDDs in memory as long as possible. b. Because Spark uses a directed acyclic graph instead of MapReduce. c. Because RDD transformations are "lazily" applied. d. Because Mesos can be used as a resource manager instead of YARN.
d. Because Mesos can be used as a resource manager instead of YARN.
Which of the following statements is not correct? a. Hive offers an SQL engine to query Hadoop data. b. Hive's query language is not as feature-complete as the full SQL standard. c. Hive offers a JDBC interface. d. Hive queries run much faster than hand-written MapReduce programs.
d. Hive queries run much faster than hand-written MapReduce programs.
Which of the following statements is not correct? a. Spark SQL exposes DataFrame and Dataset APIs which underlyingly use RDDs together with a performant SQL query engine. b. Spark SQL can be used from within Java, Python, Scala, and R. c. Spark SQL can be used through ODBC and JDBC interfaces. d. Spark SQL DataFrames need to be created by loading a file
d. Spark SQL DataFrames need to be created by loading a file
In relative location file organization
data items of the same entity are stored adjacently
physical order of table records in storage is...
determined by primary or clustered index
high-performance DBMSs communicate to storage devices...
directly
T1 updates x, T2 reads x, T1 fails and is rolled back, this is a
dirty read
DBMSs performance tuning utilities...
distribute data and tables over storage media, detect and resolve I/O problems of storage devices, and provide index-tuning wizards (all above)
Reasons for index creation...
efficient retrieval of rows, performance of join queries, and enforce uniqueness (all above)
Within a file, records are commonly organized using
embedded identification, pointers and lists, and relative location (all above)
indxes have a cost of...
extra storage and data overhead
For audio, video and text, the relational model
has poor support
NoSQL DBs support well
horizontal scaling
SQL queries with EXIST checks
if the result of a correlated nested query is empty or not
NoSQL databases support data
in non-tabular format
Transactions consist of 1 or many
insertions or deletions, updates, and retrievals (all above)
Tablespace
is a physical container of DB objects, may consist of 1 or more files, and may be distributed over several objects (all above)
NoSQL databases may support data in forms (there are 2 of this question)
key-value based, document based, or column based (all above) OR Graph based, XML and object based, or time series (all above)
max number of secondary indexes per table is...
many
Transactions submitted by various users
may execute concurrently, access the same DB items and update the same DB items (all above)
XML tags
must always be opened, closed, and correctly nested (all above)
Database is represented as a collection of
named data items
XSLT transforms XML documents to
other XML documents, HTML web pages, or plain text (all above)
Relational model
required tuples to be normalized
The smallest addressable unit of an HDD is
sector
HDD service time is
seek time + rotational delay + transfer time
HDD response time is
service time + queueing time
Random block access is _________ than sequential block access
slower
Granularity is
the size of a data item
Relational model has these constructors
tuple constructor and set constructor
The comparison condition v > ALL V returns TRUE if
value v is greater than all values in multiset V
The comparison condition v > ANY V returns TRUE if
value v is greater than at least one value in multiset V
Relational model is also known as
value-based model
RDBMSs support well
vertical scaling
SQL Views are
virtual tables without copying data
Primary storage is _____ while secondary storage is _______
volatile / persistent
When computer reboots the content of
volatile storage is erased and persistent storage is preserved