ie6700
Refer to the key-value pairs below. [email protected],'/User/pics/flower3.jpg' [email protected],'/User/pics/truck3.jpg' [email protected],'/User/pics/car4.jpg' [email protected],'/User/pics/cape8.jpg' [email protected],'/User/pics/field.jpg' [email protected],'/User/pics/flower42.jpg' [email protected],'/User/pics/hat5.jpg' [email protected],'/User/pics/door9.jpg' What is the returned value after executing: get('[email protected]'). /User/pics/car4.jpg /User/pics/door9.jpg '/User/pics/field.jpg' /User/pics/car4.jpg
'/User/pics/field.jpg'
Country How many rows are in the result table of the following expressions? CountryA ∩ CountryB 1 0 3 9
0
Given the table: Country How many rows are returned by the Query? SELECT * FROM Country WHERE Population IS NULL; 0 1 3
1
Refer to the Employee and Student tables below for the activities. How many rows are in the table defined by the following expression? Employee ∩ Student 0 7 1
1
1. Nurse to Hospitals 2. Doctors to Patients 3. Books to readers 4. People to social security numbers 5. Teams to players
1. One-to-many 2. Many-to-many 3. Many-to-many 4. One-to-one 5. One-to-many
Refer to the tables below. How many rows are in the table defined by the following expression? Department × Course a. 2 b. 8 c. 4 d. 12
12
Refer to the Employee and Student tables below for the activities. How many rows are in the table defined by the following expression? Employee − Student 0 2 3
2
Refer to the Employee and Student tables below for the activities. Maria Rodriguez takes an unpaid leave. She continues as an employee, but her salary is now 0. How many rows are in the table defined by the following expression? Employee − Student 3 0 2
2
Refer to the tables below. How many course names are selected by the following expression? Π(CourseName)(s (DepartmentName=''Art and Architecture")(Department ⋈Department.Code=Course.Code Course)) 1 2 4
2
Country How many rows are in the result table of the following expressions? CountryA - CountryB 2 6 9 3
3
Refer to the Employee and Student tables below for the activities. How many rows are in the table defined by the following expression? Student ∪ Student 0 4 8
4
Consider the following Document database which uses the last digit of the value of Class as the sharding key. List the shard key following by any matching documents for that shard based on this design - (format your answer as: <shard key>: <id1,id2...> ) 0: a8s4; 6: eg58, a8s4 6: a8s4, eg58; 9: bks2; 5: cct1; 3: dxh1 1: a8s4, 6: a8s4, eg58; 9: bks2; 5: cct1; 3: dxh1 6: a8s4, eg58; 9: bks2; 4: cct1; 3: dxh1
6: a8s4, eg58; 9: bks2; 5: cct1; 3: dxh1
Country How many rows are in the result table of the following expressions? CountryA ∪ CountryB 3 9 6 27
9
Which is NOT a component of a running Jupyter notebook application? Executable code cells in Python, R or Julia in a browser or similar client GUI support Markdown text for documentation in a web browser or similar client GUI support A Jupyter notebook compiler or interpreter running on your computer A Jupyter notebook server running locally or remotely
A Jupyter notebook compiler or interpreter running on your computer
Which statement is NOT CORRECT? In a conceptual data model, the data requirements from the business should be captured and modeled. A conceptual data model is implementation dependent. A logical data model translates the conceptual data model to a specific implementation environment. Examples of implementations of logical data models are hierarchical, CODASYL, relational or object-oriented models.
A conceptual data model is implementation dependent.
Which of the following is a property of a good hash function for use in key-value based storage structures? A good hash function will involve a complex calculation A good hash function should map its inputs as evenly as possible over the output range With different inputs, a hash function should return output that differs very little A hash function must preferentially distribute output to one server
A good hash function should map its inputs as evenly as possible over the output range
BASE stands for _A__ available, soft state, eventually _B_. A. Basic, B. Consistent A. Basically, B. Consistent A. Atomic, B. Durable
A. Basically, B. Consistent
Redis is a _A_ data store and MongoDB is a _B_ store. A. key value, B. Document A. document, B. key-value A. key-value, B. wide-column
A. key value, B. Document
Which is a property of wide-column NoSQL databases? They use json formatted files It is difficult to perform analysis on column values Entire rows can be loaded with a single operation Architectures can be designed to reduce space taken up by null values.
Architectures can be designed to reduce space taken up by null values.
Which is NOT a format of data for a dataset employed in a Data Analysis task? Big Data Semi-structure Structured Unstructured
Big Data
Refer to the "autos" collection below, and choose the result of each command. ==================== [ { "_id" : 100, "make" : "Ford", "model" : "Fusion", "year" : 2014, "options" : [ "engine start", "moon roof" ], "price" : 13500 }, { "_id" : 200, "make" : "Honda", "model" : "Accord", "year" : 2013, "options" : [ "spoiler", "alloy wheels", "sunroof" ], "price" : 16900 } ] ==================== db.autos.updateMany({ price: { $gt: 10000} }, { $set: { year: 2000, options: [] }}) Only auto with _id 100 has year set to 2000 and options removed Both autos have year set to 2000 and options removed. No autos are updated.
Both autos have year set to 2000 and options removed
Which of these is not a part of the Storage Manager in the DBMS Architecture? Connection Manager Transaction Manager Buffer Manager Recovery Manager
Connection Manager
handles: Create table Product. DCL DTL DDL DQL DML
DDL - Data Definition Language
handles selecting all rows from table Product DCL DTL DDL DQL DML
DQL - Data Query Language
Servers are set up for data partitioning using a ring topology as shown in a) A new server is added (Server 3) as shown in b). Which data will be rehashed to the new server? a) image b) image Data between position 0.58 and 0.75 will be moved from Server 0 to Server 3. Equal amounts of data from Server 0, 1, and 2 will be rehashed to Server 3 Data between position 0.75 and 0.99 All the data from positions 0.00 to 0.75
Data between position 0.58 and 0.75 will be moved from Server 0 to Server 3.
A pandas two-dimensional labelled array that is an ordered collection of columns to store heterogeneous data type is a: Numpy object Series object DataFrame object Panel object
DataFrame Object
Refer to the table definition below. Which statement correctly inserts an unnamed department with no manager? INSERT INTO Department (Name, ManagerID) VALUES ('', NULL); INSERT INTO Department VALUES (NULL, '', NULL); INSERT INTO Department (Name, ManagerID) VALUES ('');
INSERT INTO Department (Name, ManagerID) VALUES ('', NULL);
Refer to the table definition below. CREATE TABLE Department ( Code TINYINT UNSIGNED AUTO_INCREMENT, Name VARCHAR(20) NOT NULL, ManagerID SMALLINT UNSIGNED, PRIMARY KEY (Code)); Which statement correctly inserts Engineering? INSERT INTO Department (Code, Name, ManagerID) VALUES (44, 'Engineering', 2538); INSERT INTO Department VALUES ('Engineering', 2538); INSERT INTO Department (Name, ManagerID) VALUES ('Engineering', 2538);
INSERT INTO Department (Name, ManagerID) VALUES ('Engineering', 2538);
In a key-value database, the key: Uniquely identifies a single value. Specifies the location of a value in storage. Is unique and also specifies the location of a value.
Is unique and also specifies the location of a value.
One challenge with a distributed database system is that: It needs to accommodate computers being added and going offline Replication and redundancy are not possible It cannot achieve consistency No answer text provided.
It needs to accommodate computers being added and going offline
What data format is used in storing a Jupyter notebook (.ipynb) file? unstructured text file CSV text file JSON format XML format
JSON format
What is true about graph databases? Joins can be replaced by graph traversal algorithms Graph databases cannot store relational data Neo4j express unstructured data using JSON file format Graph databases are not able to model many-to-many relationships
Joins can be replaced by graph traversal algorithms
Are NULLs allowed in the foreign key column? Yes No Cannot be determined from the diagram
Yes, because participation of the related entity (Aircraft) in the relationship AssignedTo is optional
s(QUANTITY<5000)(Book) -- s stands for sigma Complete the equivalent SQL statement below: SELECT /* Your code here */ FROM Book WHERE /* and here */ ; a. SELECT * FROM Book WHERE Quantity < 5000; b. SELECT Quantity FROM Book WHERE Quantity < 5000; c. SELECT SUM(Quantity<5000) FROM Book
a
A key-value store implements: a loose schema on read a strict schema many joins relational tables
a loose schema on read
A long text document like the story, Alice in Wonderland is: impossible to search an example of structured data a data definition language an example of unstructured data that does not lend itself well to a DBMS
an example of unstructured data that does not lend itself well to a DBMS
You work for a content company that produces a continuous stream of data about users and their browsing history on your company's sites. For example: You want to be able to perform real time analytics on the popularity of pages and the profiles of users that visit those pages You need a place to store the stream of data temporarily so your analytics package can perform some analysis and then dump the results into a longer term data store. Which of the following tools would be most well suited to the task? an in-memory key value store document database relational database a master-slave architecture
an in-memory key value store
r(Area)(gCOUNT(Area)(s(Area > 400)(City))) -- r is for rho and s is for sigma Complete the Equivalent SQL Query Below: SELECT /* Your code */ FROM City /* Rest of your code */ ; a. SELECT SUM(Area) as Area FROM City WHERE (Area > 400); b. SELECT COUNT(Area) FROM City WHERE (Area > 400); c. SELECT COUNT(Area) as Area FROM City WHERE (Area > 400); a b c
c
r(CityName,MaximumArea)(CityName gMAX(Area)(s(Area < 300)(City))) -- r is for rho and g for gamma Complete the Equivalent SQL Query Below: SELECT /* Your code */ FROM City /* Rest of your code */ ; a. SELECT MAX(Area) AS MaximumArea FROM City GROUP BY CityName; b. SELECT CityName, MAX(Area) AS MaximumArea FROM City GROUP BY CityName; c. SELECT CityName, MAX(Area) FROM City GROUP BY CityName;
c
In MongoDB and other document databases, what is analogous to a table in a relational database? Group of answer choices document master collection JSON
collection
Replica sets can also be described as: copy sets that provide automated failover a master-slave architecture the same as sharding a copy of a relational data table with the same primary key
copy sets that provide automated failover
Which of the following MongoDB queries will output a list of all student in the database showing only their name and studentID. db.class.find({"name":1, "studentID":1}); db.class.list("name", studentID"); SELECT name, studentID from class db.class.find({},{"name":1, "studentID":1});
db.class.find({},{"name":1, "studentID":1});
In a distributed database system, Hash functions are used to: distribute the data as evenly as possible over the available nodes in a way that can be easily retrieved Delete entries from a node that is removed from a distributed database system Encrypt a relational database such as MySQL Make sure that ACID properties are maintained in the NoSQL database
distribute the data as evenly as possible over the available nodes in a way that can be easily retrieved
Which of the following is correct? document stores are built upon the same basic ideas as key-value stores Document stores require users to define document schemas before data can be inserted Document stores cannot provide SQL-like capabilities
document stores are built upon the same basic ideas as key-value stores
A lock manager can be used to: ensures concurrency control and consistency prevent unauthorized access to a database keep a portion of the database in memory compile the DML
ensures concurrency control and consistency
Our library system has several branches. Each branch has an address and a unique code. Each book belongs to a particular branch. Each book is assigned a unique barcode number and has a title and a publisher name. Each library member may borrow up to three books at a time. To join the library, a member provides an identification number and identification type, such as driver's license, passport number, or employee number. The entities are Branch, Book, and Member. What are the Attributes of the Member entity? _________ (Answer in format: x, y, z). Please use all lower case letters for attributes!
identification number, identification type
According to the CAP Theorem, if a NoSQL data tool scales over a large number of servers and is able to be highly available all the time, it will probably not have: an SQL coding language A large number of partitions a schema immediate consistency among replicas
immediate consistency among replicas
What is NOT a feature of modern computing that make NoSQL databases possible and desirable? Big Data on the Internet Inexpensive operators Fast networks Inexpensive commodity hardware
inexpensive operators
The term replication in a distributed NoSQL environment refers to: making periodic backups of the database to a second system taking a large dataset and splitting it into different pieces a JSON or XML file type a many-to-many relationship construct from the relational model
making periodic backups of the database to a second system
Create a relational data model from this ERD showing primary and foreign keys. Entering this into Canvas, you will have to indicate in words below the table definition)
n/a
What are 2 advantages and 1 disadvantage of using triggers in a database?
n/a
You are working for a cell phone operator and are tasked with setting up their new backend customer subscription database. Using what you know about your cell phone service, consider the following questions: a) How many base service plans can a customer have? b) Can a plan have more than one customer associated with it? c) What type of relationship is the following: Customer-to-plan
n/a
a)What does ACID mean? What is an example of a type of database that would follow ACID properties? b)What is BASE? What is an example of a type of database that would follow BASE properties, and why? c)Define the CAP Theorem and describe how it relates to BASE?
n/a
A row in a relational database is most similar to a _________ in a graph database like Neo4j. node edge edge attribute collection
node
Given the following ER Model as a GRAPH Model: Fill in the blanks in the following: (Your answer should simply enter the question index and the required blank. 'CreditCard' is a(n) ________ 'Owns' is a(n) ________. 'TotalCost' is a(n) ________. This is a(n) ________ graph (directed or undirected). node label, edge label, property name, undirected node, edge, property, directed node, edge, property, undirected
node label, edge label, property name, undirected
Hash functions are useful in NoSQL databases because they: permit horizontal scalability, a key characteristic of NoSql tools Promote vertical scalability, a key characteristic of most NoSQL data tools are used in master-slave architecutures require sophisticated computers to calculate the has function
permit horizontal scalability, a key characteristic of NoSql tools
The attribute (or set of attributes) that uniquely defines each row in a table is called the: primary key identifier symmetric key index
primary key
A ________ assigns contiguous ranges of shard key values to each shard. shard algorithm shard function range function index function
range function
Most NoSQL databases support __________ to ensure high availability and disaster recovery primary keys replication distribution vertical scaling
replication
Which would be considered a first step in database design? requirement collection and analysis creating the internal data model choosing the category of database to use setting up data tables
requirement collection and analysis
In a data frame axis-0 refers to: columns rows Both rows and columns None of the above
rows
In a key value store like Redis, values can be: single values, lists, and sets strings lists and xml only single integers or strings
single values, lists, and sets
Which method is used to add all of the values in a particular column of a dataframe. max() sum() min() mean()
sum()
Which of the following is not typically considered a characteristic of Big Data? volume velocity vividness variety
vividness
A row in a database is also called a domain True False
False
An SQL statement can implement only one relational operation. True False
False
Refer to the Employee table below for the activities. The expressions in questions 2 and 3 have the same operations in different order, these are: s(Salary>50000)r(Salary)(gSUM(Salary)(Employee))) r(Salary)(gSUM(Salary)(s(Salary>50000)(Employee))) Do both expressions define the same table? True False
False
The following command assigns _id an ObjectId: db.students.insertOne({_id: 123, "Ebony", gpa: 3.2 }) True False
False
The traditional transactional database systems do not provide any support for Data Analysis. True False
False
What's the value of the Expression: (8 % 3 + 10 > 15) AND TRUE True False
False
Which of the answers describes the output from the following SQL query: SELECT ProductDescription, FinishFROM Product_TWHERE Finish != 'Cherry'; Group of answer choices List of the attributes ProductDescription and Finish for items that have Cherry finish. List of the attributes ProductDescription and Finish for items that do not have Cherry finish. Show product ID and Product_T when the Finish is not Cherry Select all columns from Product_T when Cherry is not the finish
List of the attributes ProductDescription and Finish for items that do not have Cherry finish.
Using cypher, how do you get a list of all movies Marc Rigas has liked and for which he has given a rating of at least 4 stars? MATCH (b:User)-[L:LIKES]-(m:movie)WHERE b.name = "Marc Rigas"AND m.stars >=4RETURN m SELECT (b:User)--(m:Movie)WHERE m.name = "Marc Rigas"AND m.stars > 4 MATCH (b:User)-[l:LIKES]-(m:Movie)WHERE b.name="Marc Rigas"AND l.stars >= 4RETURN m MATCH (b:User)--()--(m:Movie)WHERE b.name = "Marc Rigas"AND m.stars >=4RETURN m
MATCH (b:User)-[l:LIKES]-(m:Movie)WHERE b.name="Marc Rigas"AND l.stars >= 4RETURN m
Refer to the statement below. CREATE TABLE Department ( Code TINYINT UNSIGNED NOT NULL, Name VARCHAR(20), ManagerID SMALLINT ); ManagerID NOT NULL SMALLINT ManagerID NOT NULL ManagerID SMALLINT NOT NULL
ManagerID SMALLINT NOT NULL
Which NOSQL data system can easily store semi-structured documents imported from JSON files? MongoDB MySQL Redis SQL server
MongoDB
Refer to the tables below Family PhoneNumber Can (ID, Relationship) be the primary key of Family? Yes No Cannot be determined from the data in the table
No
Which is the most popular open-source Python library used for doing data analysis? Numpy Matplotlib Pandas Scipy
Pandas
Given the hash function: (key) mod n where n is the number of server partitions, compute where you would send a key value pair on a 3-partition system when the key = 24. Partition 0 Partition 1 Partition 2 Partition 3
Partition 0
Refer to the Employee table below for the activities. What does the following expression do? r(ID, Name, Compensation)(Employee) Computes total employee compensation. Renames the Employee table to Compensation. Renames the Salary column to Compensation.
Renames the Salary column to Compensation.
In a ring topology used for sharding, which of the following is true? Ring topologies can be used to help achieve consistent tasing A large fraction of the key value pairs will have to be rehashed if a partition/server is added or removed If you use a ring topology, you will not have to remap ANY keys if servers are added or removed Another term for ring topology is membership protocol
Ring topologies can be used to help achieve consistent tasing
Which SQL statement below does the following MongoDB script most closely correspond to? db.students.find( {year: 4, age:{$gt: 21}}) SELECT *FROM studentsWHERE year = 4 AND age>21; SELECT age,year FROM studentsWHERE year = 4 AND age="$gt:21"; SELECT *FROM studentsWHERE age > 21 AND gt=TRUE; SELECT year, ageFROM studentsWHERE year = 4;
SELECT *FROM studentsWHERE year = 4 AND age>21;
Refer to the Department table. Department What departments are deleted? DELETE FROM Department WHERE ManagerID = 6381; Sales Marketing Sales and Marketing
Sales and Marketing
MapReduce takes advantage of what key principle behind document stores and other NoSQL systems? Scaling and replicating data over a large number of nodes/servers Using the most powerful computers possible Not being constrained by the SQL programming language the fact that data is stored in documents
Scaling and replicating data over a large number of nodes/servers
In the ring topology architecture above, where would a data item that hashes to position 0.25 be portioned to? Server 2 Server 0 Server 1
Server 2
In Neo4J's Cypher language, what is the difference between these two MATCH statements? Statement 1:MATCH (d:Drinker)-[:LIKES]-(b:Beer) Statement 2:MATCH (d:Drinker)--(b:Beer) Statement 1 will identify the pattern where there is a LIKES relationship but not other types of relationships Statement 1 will identify a beer that a drinker likes and will also identify the brewery that brews the beer Both statements are exactly the same Statement 2 WILL NOT identify a beer that a drinker likes
Statement 1 will identify the pattern where there is a LIKES relationship but not other types of relationships
Which of the following statements on tuple and document stores is correct? The key can be put in one of the attributes It has a less rich API than a key-value store Indices are not possible to define Document stores are always implemented using JSON
The key can be put in one of the attributes
The SQL statement below is used to select students with the last name "Smith". What is wrong with the statement? SELECT FirstName FROM Student WHERE LastName = Smith; The WHERE clause should be removed. The literal "Smith" should be surrounded by single or double quotes. The last name "Smith" may not exist in the database
The literal "Smith" should be surrounded by single or double quotes.
Which statement is CORRECT? The recovery manager keeps track of all the database operations in a logfile. The connection manager sets up a database connection and verifies the logon credentials and the privileges. The query rewriter optimizes the query based on the current database state. The lock manager is responsible to ensure the ACID properties.
The recovery manager keeps track of all the database operations in a logfile.
Which term matches: Increase capacity by increasing speed and size of a limited number of machines. Sharding Partitioning Vertical scaling Horizontal scaling
Vertical scaling
A Series is a Pandas data structure that represents a one-dimensional array-like object of indexed data. True False
True
FareClass depends on PK (PassengerNumber, FlightCode). True False
True
Given the following MongoDB insertMany() command: The inserted documents received similar ObjectIds. True False
True
Many NoSQL databases were designed to handle problems around horizontal scaling on cheap commodity hardware. True False
True
Refer to the key-value pairs below. [email protected],'/User/pics/flower42.jpg' [email protected],'/User/pics/hat5.jpg' [email protected],'/User/pics/field.jpg' [email protected],'/User/pics/flower3.jpg' [email protected],'/User/pics/car4.jpg' [email protected],'/User/pics/door9.jpg' [email protected],'/User/pics/truck3.jpg' [email protected],'/User/pics/cape8.jpg' When: put('[email protected]','/User/pics/cat4.jpg') is executed a new entry is added to the collection. True False
True
SQL commands can create databases and tables. True False
True
The main class of the Pandas library is the the DataFrame class which models a 2-dimensional data set similar to an Excel spreadsheet or database table. True False
True
The subtype table primary key is identical to the supertype table primary key. True False
True
To access subset of a dataframe we can use loc() method. True False
True
What is the value of the following expression?: (Age >= 13 AND Age <= 18) OR Military = 'Army' where Age = 8 and Military = 'Army' True False
True
Which of the following is not a property of a good hash function for use in key-value based storage structures? Two hashes from two inputs that differ by a small margin should also differ as little as possible. A good hash function should map its inputs as evenly as possible over the output range A hash function should return an output of fixed size. Two hashes from two inputs that differ by a small margin may differ by a wide margin
Two hashes from two inputs that differ by a small margin should also differ as little as possible.
What is the size of a typical value in a key-value data store? Usually less than a kilobyte. Usually kilobytes or megabytes. Usually gigabytes to terabytes.
Usually kilobytes or megabytes.