IST 769 Advanced Database Management QUIZ
Why can you simply drop a KSQL persistent stream? Select one: a. Because it is bound to another table or stream b. Because it's running in the background c. Because it contains data d. Because it contains metadata
b. Because it's running in the background
To create a new Kafka topic as a persistent query from a KSQL query, we use Select one: a. CREATE TABLE b. CREATE STREAM c. CREATE TOPIC d. CREATE BROKER
b. CREATE STREAM
The Cassandra key which guarantees unique values stored on each node is known as the Select one: a. Primary key b. Cluster key c. Partition key d. Unique key
b. Cluster key
Sending data to the data layer, which is either added or updated based on its existence, is an example of Select one: a. Business logic b. Data logic c. Presentation logic d. Boolean logic
b. Data logic
Which T-SQL code creates a variable to store text data such as a product name? Select one: a. Set @product_name varchar(100) b. Declare @product_name varchar(100) c. Declare product_name varchar(100) d. Set product_name varchar(100)
b. Declare @product_name varchar(100)
To output a SELECT statement as JSON, we use Select one: a. TO JSON AUTO b. FOR JSON AUTO c. FROM JSON AUTO d. AS JSON AUTO
b. FOR JSON AUTO
The system to manage distributed storage on Hadoop is called Select one: a. MapReduce b. HDFS c. YARN d. HIVE
b. HDFS
A tool for querying Hadoop data using a SQL-like syntax is Select one: a. Hcatalog b. Hive c. Pig d. HDFS
b. Hive
When considering how to improve the performance of a query, we prefer Select one: a. Table seeks over index seeks b. Index seeks over table scans c. Table scans over table seeks d. Index scans over index seeks
b. Index seeks over table scans
The T-SQL function to convert a JSON-encoded string into a JSON object is Select one: a. OPENJSON() b. JSON_QUERY() c. JSON_VALUE() d. JSON_OBJECT()
b. JSON_QUERY()
When you attempt to insert data with the same key more than once, Cassandra Select one: a. Ignores the data b. Performs an update c. Adds the same data another time d. Throws an error
b. Performs an update
An entity which is given permission to a resource is known as a/an Select one: a. Securable b. Principal c. Identity d. User
b. Principal
Which isolation level does not lock writes? Select one: a. Read committed b. Read uncommitted Correct c. Read committed snapshot d. Serializable
b. Read uncommitted Correct
Hive stores information about tables and their schema in Select one: a. The Hadoop cache b. The metastore c. HDFS folders d. HDFS files
b. The metastore
When it comes to database management systems performance and normalized data Select one: a. The more normalized your data, the more performant your DBMS b. The more normalized your data, the less performant your DBMS c. The less normalized your data, the less performant your DBMS d. There is no relationship between DBMS performance and data normalization
b. The more normalized your data, the less performant your DBMS
In MongoDb, when you write the commanddb.orders.insert( { 'name' : 'My Order', 'amount' : 5000 })what happens when "orders" does not exist? Select one: a. You are asked whether to create the orders collection. b. The orders collection is created for you before the insert. c. An error is thrown. d. The operation does not complete, but no error is shown.
b. The orders collection is created for you before the insert.
In Kafka, a subject-oriented feed of messages is known as a Select one: a. Consumer b. Topic c. Broker d. Replica
b. Topic
In Mongo DB, to delete all documents from the cars collection where the make of the car is Chevy, we write Select one: a. db.delete('cars', {'make' :'chevy' }) b. db.remove('cars', {'make' :'chevy' }) c. db.cars.delete( {'make' :'chevy' }) d. db.cars.remove( {'make' :'chevy' })
b. db.remove('cars', {'make' :'chevy' })
To upload a file name foo.csv to HDFS, we type Select one: a. hadoop dfs -put foo.csv b. hdfs dfs -put foo.csv c. hadoop fs put foo.csv d. hdfs dfs put foo.csv
b. hdfs dfs -put foo.csv
Kafka scales by distributing __________ over partitions which are then replicated. Select one: a. brokers b. topics c. producers d. consumers
b. topics
What does this T-SQL print charindex("E", "QWERY") return? Select one: a. -1 b. 2 c. 3 d. Null
c. 3
A collection of HBase columns physically stored together is known as Select one: a. A table b. A row ley c. A column family d. A cell
c. A column family
To create a transaction in T-SQL, we write Select one: a. START TRANSACTION b. BEGIN TRANSACTION c. INIT TRANSACTION d. CREATE TRANSACTION
b. BEGIN TRANSACTION
To perform column projection similar to the SELECT statement in SQL, in Pig Latin we use Select one: a. FOREACH ... GENERATE b. GROUP BY c. FROM d. FILTER
a. FOREACH ... GENERATE
The physical files which make up a SQL server database are part of the database's Select one: a. Filegroup b. Partition c. Instance d. Page
a. Filegroup
Which of the following is not a proper use case for Cassandra? Select one: a. Highly normalized data b. Applications where you must guarantee writes c. Applications with high-volume writes d. Timeseries data
a. Highly normalized data
Spark is faster than MapReduce because Select one: a. It optimizes the number of maps and reduces required to complete the task. b. It uses twice as many nodes as MapReduce. c. It was written in C++ instead of Java. d. All of the data is preloaded into memory before the task executes.
a. It optimizes the number of maps and reduces required to complete the task.
The RDBMS equivalent of a database or schema in Cassandra is known as a Select one: a. Keyspace b. Table c. Index d. Cluster
a. Keyspace
The nodes which manage the Hadoop infrastructure are known as Select one: a. Master nodes b. Node managers c. Worker nodes d. Data nodes
a. Master nodes
Which of the following is NOT a requirement of a database transaction? Select one: a. Must guarantee data is always available b. Must succeed or fail as a whole c. Must execute independently from other transactions d. Must leave all data in a consistent state
a. Must guarantee data is always available
A scripting language for Hadoop which allows you to perform data manipulation tasks is called Select one: a. Pig b. Hcatalog c. HDFS d. Hive
a. Pig
To query the foreign keys in a database, we write Select one: a. SELECT * FROM INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS b. SELECT * FROM INFORMATION_SCHEMA.CHECK_CONSTRAINTS c. SELECT * FROM INFORMATION_SCHEMA.KEYS d. SELECT * FROM INFORMATION_SCHEMA.FOREIGN_KEYS
a. SELECT * FROM INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS
Given the following T-SQL code: declare @count as int to assign 7 to this variable in T-SQL we write Select one: a. Set @count = 7 b. count = 7 c. set count = 7 d. @count = 7
a. Set @count = 7
What is the most appropriate data type to store whole numbers in the range 1 to 15,000 Select one: a. Smallint b. Decimal(5,0) c. Money d. Int
a. Smallint
To import data from a SQL database into Hadoop, we use Select one: a. Sqoop b. Flume c. Pig d. Hive
a. Sqoop It's an ETL
An unbounded, continuous flow of data in a topic is known as a Select one: a. Stream b. Consumer c. Broker d. Replica
a. Stream
The equivalent of a table row in Pig is called a Select one: a. Tuple b. Field c. Bag d. Relation
a. Tuple
Redis has how many different persistence architectures? Select one: a. Two b. One c. Four d. Three
a. Two
Every temporal table has Select one: a. Two period columns to indicate when the data is valid and a separate history table b. Two period columns to indicate when the data is valid c. A database trigger and a separate history table d. A separate history table
a. Two period columns to indicate when the data is valid and a separate history table
The distributed processing framework of Hadoop is known as Select one: a. YARN b. Flume c. Sqoop d. Pig
a. YARN
JSON lists are represented by which symbols? Select one: a. { } b. ( ) c. " " d. [ ]
d. [ ]
When the transaction of one update overwrites another, we have Select one: a. A bad lock b. An inconsistent read c. A deadlock d. A lost update
d. A lost update
The "Hadoop way" is to store our data Select one: a. Normalized b. With clustered indexes c. Denormalized d. As-is with no transformations
d. As-is with no transformations
To query a SQL temporal table's data as it has changed over time, we use which FOR SYSTEM_TIME clause? Select one: a. WITHIN b. AS OF c. ALL d. BETWEEN
d. BETWEEN
Cassandra does not support Select one: a. Integrity constraints b. Single-valued keys c. Table joins d. Cassandra does not support all of these listed.
d. Cassandra does not support all of these listed.
To see the output of a Pig relation, we type Select one: a. Describe b. Store c. Explain d. Dump
d. Dump
To limit data similar to the WHERE clause of the SELECT statement in SQL, in Pig Latin we use Select one: a. REDUCE b. WHERE c. LIMIT d. FILTER
d. FILTER
To import log data into Hadoop, we use Select one: a. HDFS b. Sqoop c. YARN d. Flume
d. Flume
Since every document requires an _id attribute, when you insert data into a collection without an _id, MongoDB will Select one: a. Figure out which combinations of attributes should be used as the _id b. Assign a value of Null to the _id c. Select the first attribute in the document as the _id d. Generate a unique _id for you
d. Generate a unique _id for you
Which of the following is NOT a common use case for database triggers? Select one: a. Audit changes to the database b. Ensure data integrity c. Prevent automatic updates d. Improve query performance
d. Improve query performance
What is an organization's most important asset? a. Its business plan b. Its custom-built software applications c. Its employees d. Its data
d. Its data
Because Redis keys are global to the database system, it is important to Select one: a. Use auto-increment numbers for keys b. Use a globally unique identifier for a key (GUID) c. Select the same key used from the RDBMS d. Namespace your keys
d. Namespace your keys
In Hadoop, schema on read refers to Select one: a. Reading the schema from HDFS with the data b. Partitioning a data lake c. Scaling data horizontally across the data lake d. Not storing schema with our data like we do with an RDBMS
d. Not storing schema with our data like we do with an RDBMS
In KSQL, the implicit columns found in every table are Select one: a. ROWKEY and TOPIC b. TOPIC, ROWKEY, and ROWTIME c. TOPIC and ROWTIME d. ROWKEY and ROWTIME
d. ROWKEY and ROWTIME
Which of the following databases uses a key-value store data model? a. Mongo DB b. Kafka c. Microsoft SQL server d. Redis
d. Redis
HBase differs from Hive and Pig HDFS in that it Select one: a. Does not use HDFS b. Does not use HCatalog c. Does not have a client application d. Supports data updates
d. Supports data updates
The more data your organization has a. The less likely your organization will be able to extract value from it b. The greater the chances it will be "big data" c. The increased likelihood of using noSQL databases d. The less likely you will have the talent to process it
d. The less likely you will have the talent to process it
In order to index a view Select one: a. The underlying tables must have a primary key or unique clustered index. b. All of these items listed are true. c. The query must be deterministic. d. The table schema must be qualified (no select *).
b. All of these items listed are true.
WEEK 3: The T-SQL command to return the number of rows affected from the previous SQL statement is Select one: a. @@ROWCOUNT b. @ROWCOUNT c. @TRANCOUNT d. @@TRANCOUNT
a. @@ROWCOUNT
To save a SQL SELECT statement under a name, we use Select one: a. A view b. The declare statement c. A stored procedure d. A scalar-valued function
a. A view
Relational database systems adhere to the __________ principle. a. ACID b. BASE c. KISS d. SQL
a. ACID
The individual steps of a database transaction are treated as a whole. This is an example of Select one: a. Atomic b. Isolated c. Consistent d. Durable
a. Atomic
The node which participates in a Kafka cluster is called a Select one: a. Broker b. Producer c. Topic d. Consumer
a. Broker
Due to its distributed nature, it is not practical to perform what in Cassandra? Select one: a. Bulk updates b. Row filters c. Bulk inserts d. Column projections
a. Bulk updates
In the Redis data model, a key Select one: a. Can be anything you want b. Must be numerical values c. Must be sequential numerical values d. Must be text values
a. Can be anything you want
One advantage of the column store index over a traditional/row store index is that the column store index Select one: a. Can cover several queried columns on the table b. Improves the performance of row updates c. Is not compressed d. Is not cached and therefore uses less memory
a. Can cover several queried columns on the table
To convert the column foo to the data type int, we type Select one: a. Cast(foo as int) b. Convert(foo, int) c. Convert(foo as int) d. Cast(foo, int)
a. Cast(foo as int)
Impala's INVALIDATE METADATA statement Select one: a. Clears the Hive metastore cache b. Clears the cached output from the table c. Clears the Hive metastore cache and reloads all the tables back into metastore cache d. Clears the Hive metastore cache and reloads the table that is part of the query back into the metastore cache
a. Clears the Hive metastore cache
The RDBMS equivalent of a table in Mongo DB is known as a Select one: a. Collection b. Document c. Replica d. Page
a. Collection
When your Cassandra query is rejected because it requires ALLOW FILTERING, you should Select one: a. Consider adding an index b. Always add it to your SELECT clause c. Rewrite your query d. Redesign your table
a. Consider adding an index
Relational databases a. Do not scale horizontally but scale vertically b. The less likely you will have the talent to process it c. Scale both horizontally and vertically d. Do not scale vertically but scale horizontally
a. Do not scale horizontally but scale vertically
In Hive, if you want the HDFS data to remain after the table is dropped you should create a/an Select one: a. External table b. Metastore table c. Internal table d. Hadoop cache tabled
a. External table
To improve read performance of Hive data, you can write the query output to Select one: a. HCatalog b. A Parquet file c. HDFS d. A text file
b. A Parquet file
When you need to query your table data under a different partition key, you should create Select one: a. A secondary partition key on the existing table b. A materialized view based on the existing table c. A new table, then copy the data over d. A secondary index based on the existing table
b. A materialized view based on the existing table
MongoDB guarantees consistent reads through Select one: a. Replication b. A single master architecture c. Sharding d. Foreign key constraints
b. A single master architecture
The difference between a T-SQL view and a table-valued function is Select one: a. A table-valued function is immutable (does not change data). b. A view is immutable (does not change data). c. A table-valued function allows for parameters. d. A view allows for parameters.
c. A table-valued function allows for parameters.
Adding an item to your shopping cart is an example of the a. Application logic layer b. Presentation layer c. Business logic layer d. Data access layer
c. Business logic layer
A SQL server table can only have one Select one: a. Nonclustered index b. Columnstore index c. Clustered index d. Indexed view
c. Clustered index
When two transactions are waiting for each other to release a lock, we have a Select one: a. Stalemate b. Circular lock c. Deadlock d. Bad lock
c. Deadlock
Most data generated today comes from a. Web logs b. Social media c. Devices d. Business transactions
c. Devices
Kafka is known as a data backbone. In this role it Select one: a. Serves as the only data store b. Serves as primary storage for all systems c. Exchanges data between other systems d. Acts as a data cache for all systems
c. Exchanges data between other systems
To store several fields under the same key in Redis, use Select one: a. Sets b. Lists c. Hashes d. Strings
c. Hashes
Which of the following is a required characteristic of any noSQL database system? a. Data does not need to be normalized b. No schema required c. Horizontal scalability d. Does not use SQL to query
c. Horizontal scalability
Which of the following is NOT a data model typically found in noSQL? a. Key value b. Column oriented c. Image d. Graph
c. Image
Impala and Hive are similar except that Select one: a. Impala does not use metastore. b. Impala does not use HDFS. c. Impala does not generate MapReduce jobs. d. Impala does not use a SQL-Like syntax.
c. Impala does not generate MapReduce jobs.
Which of the following is not a method of distributing Hive table data across nodes in the Hadoop cluster? Select one: a. Buckets Incorrect b. Skewed c. Indexes d. Partitions
c. Indexes
HBase is best suited for Select one: a. Read-only data b. SQL queries c. Low-latency operations d. High-latency operations
c. Low-latency operations
The ability of a distributed database to operate among a network outage is known as a. Data consistency b. Partition consistency c. Partition tolerance d. Data availability
c. Partition tolerance
Which of the following is not a KSQL windowing option? Select one: a. Hopping b. Tumbling c. Persistent d. Session
c. Persistent
An application which uses more than one database for storage is an example of a. Big data b. Middleware c. Polyglot persistence d. Eventual consistency
c. Polyglot persistence
To improve performance of a query, Impala Select one: a. Retains a copy of the Hive table metadata on the master node for all tables in the database b. Retains a copy of the Hive table metadata on the master node for only the tables which are part of the query c. Retains a copy of the Hive table metadata on each worker node for only the tables which are part of the query d. Retains a copy of the Hive table metadata on each worker node for all tables in the database
c. Retains a copy of the Hive table metadata on each worker node for only the tables which are part of the query
Which of the following is not one of the four core libraries of Spark? Select one: a. GraphX b. Streaming c. SKLearn d. Spark SQL
c. SKLearn
To produce a table of data from a comma-separated list of values, we use which T-SQL function? Select one: a. STRING_AGG b. STRING_COMBINE c. STRING_SPLIT d. ISNULL
c. STRING_SPLIT
Which is not a typical use case for a Redis database? Select one: a. Session store—keep track of logged-in users b. Caching system—store copies of output from other systems c. Store master data, like customer information d. Publish/subscribe system
c. Store master data, like customer information
Eventual consistency means you cannot guarantee the consistency of Select one: a. The system itself b. The data you read or write c. The data you read d. The data you write
c. The data you read
For a Cassandra table consisting of user tweets where the Twitter user name is the partition key while the timestamp serves as the cluster key, which of the following is true? Select one: a. Tweets are sorted by user name on each node in the cluster. b. Tweets from the same user name are grouped physically stored together on each node. c. Tweets from the same user name are stored on the same node in the cluster. d. Tweets with the same timestamp are stored on the same node in the cluster.
c. Tweets from the same user name are stored on the same node in the cluster.
YARN differs from MapReduce in that Select one: a. MapReduce applications can remain resident in memory on the Hadoop cluster. b. YARN supports only batch-processing activities. c. YARN supports high availability and fail over. d. MapReduce is not dependent on HDFS data nodes.
c. YARN supports high availability and fail over.
Which of the following is not an advantage of connecting HBase to the Hive metastore? Select one: a. You can load data into an HBase table with HQL, which is easier than the HBase API. b. You can query HBase tables with HQL instead of HBase's API, which is not very user friendly. c. You can take run Hive queries from an HBase client. d. Placing the HBase table in the metastore opens up table accessibility to other Hadoop tools like Impala.
c. You can take run Hive queries from an HBase client.
You should use HBase when Select one: a. You need strong data types. b. You need integrity constraints. c. Your data set scale has outgrown RDBMS. d. You require SQL.
c. Your data set scale has outgrown RDBMS.
The HDFS file system divides large files into ________, which are distributed over each of the data nodes. Select one: a. partitions b. chunks c. blocks d. replicas
c. blocks