MIS 3123 FINAL All questions from book - Principles of Database management

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

TRUE or FALSE - In the relational model, the set constructor allows defining multi-valued attribute types.

False

TRUE or FALSE - User-defined functions (UDFs) can only work on user-defined data types.

False

TRUE or FALSE - An opaque data type is an entirely new, user-defined data type, which is not based upon any existing SQL data type.

True

TRUE or FALSE - An unnamed row type allows inclusion of a composite data type in a table by using the keyword ROW.

True

TRUE or FALSE - In a relational model, the tuple constructor can only be used on atomic values and the set constructor can only be used on tuples.

True

Which statement is correct? a. A key distinguishing property of a data lake is that it stores raw data in its native format, which could be structured, unstructured, or semi-structured. b. A data lake is targeted toward decision-makers at middle- and top-management level, whereas a data warehouse requires a data scientist, which is a more specialized profile in terms of data handling and analysis. c. In case of a data warehouse, the data schema definitions are only determined when the data are read (schema-on-read), whereas for data lakes they are fixed when the data are loaded (schema-on-write). d. A data lake is less agile compared to a data warehouse, which has no structure.

a. A key distinguishing property of a data lake is that it stores raw data in its native format, which could be structured, unstructured, or semi-structured.

Which statement is not correct? a. A snowflake schema normalizes the fact table of a star schema. b. A fact constellation schema has more than one fact table which can share dimension tables. c. Surrogate keys essentially buffer the data warehouse from the operational environment by making it immune to any operational changes. d. A factless fact table is a fact table that only contains foreign keys and no measurement data.

a. A snowflake schema normalizes the fact table of a star schema.

Which data type can be used to store image data? a. BLOB. b. CLOB. c. DBCLOB. d. None of the above.

a. BLOB.

OLAP (on-line analytical processing) can help in which of the following steps of the analytics process? a. Data collection. b. Data visualization. c. Data transformation. d. Data denormalization.

a. Data collection. (she wants data visualization)

Which of the following statements is correct? a. HBase can be considered as a NoSQL database. b. HBase offers an SQL engine to query its data. c. MapReduce programs cannot be used with HBase. Data are accessed using simple put and get commands instead. d. HBase works well on large clusters as well as small ones having a few nodes.

a. HBase can be considered as a NoSQL database.

Which statement is correct? a. In the relational model, the tuple constructor can only be used on atomic values and the set constructor can only be used on tuples. b. In the relational model, the tuple constructor allows defining composite attribute types. c. In the relational model, the set constructor allows defining multi-valued attribute types. d. In the relational model, the tuple and set constructor can be used in a nested way.

a. In the relational model, the tuple constructor can only be used on atomic values and the set constructor can only be used on tuples.

Which statement is not correct? a. Junk dimensions can be defined to efficiently accommodate low-cardinality attribute types such as flags or indicators. b. An outrigger table can be defined to store a set of attribute types of a dimension table which are uncorrelated, high in cardinality, and updated simultaneously. c. For slowly changing dimensions, surrogate keys can be handy to store the historical information by duplicating a record and adding, e.g., Start_Date, End_Date, and Current_Flag attribute types. d. One way to deal with rapidly changing dimensions is by splitting the information into stable and rapidly changing information. The latter can then be put into a separate mini-dimension table with a new surrogate key. The connection can then be made by using the fact table or by introducing a new table connecting both.

a. Junk dimensions can be defined to efficiently accommodate low-cardinality attribute types such as flags or indicators.

Which statement is not correct? a. Master data management (MDM) compromises a series of processes, policies, standards, and tools to help organizations define and provide multiple points of reference for all data that are "mastered". b. The focus of MDM is on unifying company-wide reference data types such as customers and products. c. Setting up an MDM initiative involves a large number of steps and tools, including data source identification, mapping out the systems architecture, constructing data transformation, cleansing and normalization rules, providing data storage capabilities, monitoring and governance facilities, and so on. d. A key element in MDM is a centrally governed data model and metadata repository.

a. Master data management (MDM) compromises a series of processes, policies, standards, and tools to help organizations define and provide multiple points of reference for all data that are "mastered".

Which of the following statements is not correct? a. Objects are blueprints of classes. "Human", "Employee", and "Sale" are examples of objects. b. Objects are instances of classes. The people "Bart Baesens", "Wilfried Lemahieu", and "Seppe vanden Broucke" could be instances of the class "Person". c. Objects store both a piece of information and ways to manipulate this information. d. A class can be instantiated into several objects.

a. Objects are blueprints of classes. "Human", "Employee", and "Sale" are examples of objects.

Which statement is not correct? a. Persistence by marking implies that all objects will be created as persistent. An object can then be marked as transient at compile time. b. Persistence by class implies that all objects of a particular class will be made persistent. c. Persistence by creation is achieved by extending the syntax for creating objects to indicate at compile time that an object should be made persistent. d. Persistence by inheritance indicates that the persistence capabilities are inherited from a predefined persistent class.

a. Persistence by marking implies that all objects will be created as persistent. An object can then be marked as transient at compile time.

Which of the following commands are not a part of HBase? a. Place. b. Put. c. Get. d. Describe.

a. Place.

Which of the following statements is not correct? a. Spark SQL exposes DataFrame and Dataset APIs which underlyingly use RDDs together with a performant SQL query engine. b. Spark SQL can be used from within Java, Python, Scala, and R. c. Spark SQL can be used through ODBC and JDBC interfaces. d. Spark SQL DataFrames need to be created by loading a file.

a. Spark SQL exposes DataFrame and Dataset APIs which underlyingly use RDDs together with a performant SQL query engine.

The key difference between stored procedures and triggers is that: a. Stored procedures are explicitly invoked whereas triggers are implicitly invoked. b. Stored procedures cannot have input variables whereas triggers can. c. Stored procedures are stored in the data catalog, whereas triggers are not. d. Stored procedures are more difficult to debug than triggers.

a. Stored procedures are explicitly invoked whereas triggers are implicitly invoked.

Which statement about object identifiers (OIDs) is correct? a. The OID of an object remains the same during the entire lifetime of the object. b. An OID is the same as a primary key in a relational database setting. c. Two objects with the same values always have the same OID. d. Each literal is defined by an OID according to the ODMG standard.

a. The OID of an object remains the same during the entire lifetime of the object.

Which of the following is correct? a. The fact that most NoSQL databases adopt an eventual consistency approach is due to the CAP theorem, which states that strong consistency cannot be obtained when availability and partitioning have to be ensured. b. Replicas in a distributed NoSQL environment relate to making periodic backups of the database to a second system. c. Stabilization relates to the waiting time between the start-up of a NoSQL system and when the system becomes available to receive user queries. d. Some relational constructs, such as the many-to-many relationship, are harder to express using graph databases.

a. The fact that most NoSQL databases adopt an eventual consistency approach is due to the CAP theorem, which states that strong consistency cannot be obtained when availability and partitioning have to be ensured.

Which statement is correct? a. The prevalent approach for indexing full-text documents is an inverted index. b. SQL is well suited to query structured collections of records as well as unstructured data such as text. c. It makes no sense to look at HTML markup when calculating the weight of a term to a page for web search. d. Enterprise search technologies are strongly related to standard web search products and providers (e.g., Google), but aim to offer a series of tools that can be deployed and used externally such that an organization can expose itself to the outside world.

a. The prevalent approach for indexing full-text documents is an inverted index.

Which statement is correct? a. Using XSLT, an XML document can be transformed to another XML document. b. Using HTML, an XML document can be transformed to an XSLT document. c. Using XML, an XSLT document can be transformed to an XML document. d. Using DTD, an XML document can be transformed to an HTML document.

a. Using XSLT, an XML document can be transformed to another XML document.

What do the 5 Vs of Big Data stand for? a. Volume, variety, velocity, veracity, value. b. Volume, visualization, velocity, variety, value. c. Volume, variety, velocity, variability, value. d. Volume, versatile, velocity, visualization, value.

a. Volume, variety, velocity, veracity, value.

The federation pattern typically follows... a. a pull approach. b. a push approach.

a. a pull approach

The orchestration pattern to manage sequence and data dependencies is a... a. centralized approach. b. decentralized approach.

a. centralized approach

XML focuses on the... a. content of documents. b. representation of documents.

a. content of documents.

Bootstrapping refers to... a. drawing samples with replacement. b. drawing samples without replacement.

a. drawing samples with replacement.

Outlying observations which represent erroneous data are treated using... a. missing value procedures. b. truncation or capping.

a. missing value procedures.

A key difference between XML data and relational data is that... a. relational data assume atomic data types, whereas XML data can consist of aggregated types. b. relational data are ordered, whereas XML data are unordered. c. relational data can be nested, whereas XML data cannot be nested. d. relational data can be multi-valued, whereas XML data cannot be multi-valued.

a. relational data assume atomic data types, whereas XML data can consist of aggregated types.

Objects should be made persistent when... a. you need them over multiple program executions. b. you only need them during one program execution, and then never again.

a. you need them over multiple program executions.

Which of these statements is correct? a. A set is an ordered collection with no duplicates. b. A bag is an unordered collection which may contain duplicates. c. A list is an ordered collection which cannot contain duplicates. d. An array is an unordered collection which can contain duplicates.

b. A bag is an unordered collection which may contain duplicates.

Which of the following is correct? a. User-defined functions (UDFs) can only work on user-defined data types. b. A sourced function is a user-defined function (UDF) that is based on an existing, built-in function. c. User-defined functions (UDFs) can only be defined in SQL. d. User-defined functions (UDFs) must be stored in the application, and not in the catalog.

b. A sourced function is a user-defined function (UDF) that is based on an existing, built-in function.

Which statement is not correct? a. The essence of data consolidation as a data integration pattern is to capture the data from multiple, heterogeneous source systems and integrate it into a single persistent store (e.g., a data warehouse or data mart). b. An important disadvantage of the consolidation approach is that it does not cater for historical data. c. An ETL process typically induces a certain measure of latency, so the timeliness dimension may suffer, with the data being slightly out of date. d. Besides the traditional set-up with ETL and a data warehouse, data lakes can also be considered an implementation of the consolidation pattern.

b. An important disadvantage of the consolidation approach is that it does not cater for historical data.

Examine the followingdecision tree: According to the decision tree, an applicant with Income > $50,000 and High Debt = Yes is classified as: a. Good risk. b. Bad risk.

b. Bad risk.

Which statement is not correct? a. Data virtualization isolates applications and users from the actual (combinations of) data integration patterns used. b. Data virtualization extensively uses data consolidation techniques such as ETL. c. Contrary to a federated database as offered by basic EII, data virtualization does not impose a single data model on top of the heterogeneous data sources. d. In many real-life contexts, a data integration exercise is an ongoing initiative within an organization, and will often combine a variety of integration strategies and approaches.

b. Data virtualization extensively uses data consolidation techniques such as ETL.

Which of the following statements is not correct? a. Graphs are mathematical structures consisting of nodes and edges. b. Graph models are not capable of modeling many-to-many relationships. c. Edges in graphs can be uni- or bidirectional. d. Graph databases work particularly well on tree-like structures.

b. Graph models are not capable of modeling many-to-many relationships.

Which components does the base Hadoop stack include? a. NDFS, MapReduce, and YARN. b. HDFS, MapReduce, and YARN. c. HDFS, Map, and Reduce. d. HDFS, Spark, and YARN.

b. HDFS, MapReduce, and YARN.

In terms ofdata manipulation, a data warehouse focuses on... a. Insert/Update/Delete/Select statements. b. Insert/Select statements. c. Select/Update statements. d. Delete statements.

b. Insert/Select statements.

Which of the following statements is not correct about XPath? a. It is a simple, declarative language. b. It considers an XML document as a set of XML elements. c. It uses path expressions to refer to parts of an XML document. d. Every navigation step results in a node or list of nodes which can then be used to continue the navigation.

b. It considers an XML document as a set of XML elements.

What statement about XQuery is not correct? a. It allows making use of both the document structure and its content. b. It does not allow joining information from different XML documents. c. It uses XPath expressions to navigate through the document. d. The end results can be sorted.

b. It does not allow joining information from different XML documents.

What statement about OQL is not correct? a. OQL is a declarative, non-procedural query language. b. Join queries are not supported in OQL. c. OQL can be used for both navigational (procedural) as well as associative (declarative) access. d. The OQL language provides no explicit support for INSERT, UPDATE, and DELETE operations.

b. Join queries are not supported in OQL.

Which of the following statements is correct? a. One of the disadvantages of Spark is that it does not support streaming data. b. One of the disadvantages of Spark is that its streaming and machine learning APIs are still mostly RDD-based. c. One of the disadvantages of Spark is that it has no way to deal with graph-based data. d. One of the disadvantages of Spark is that its streaming API does not allow joining multiple streams.

b. One of the disadvantages of Spark is that its streaming and machine learning APIs are still mostly RDD-based.

Which of the following statements is not correct? a. An RDF data model consists of statements which are in subject-predicate-object format. b. RDF allows use of database-specific primary keys to identify resources. c. An RDF data model can be visualized as a directed, labeled graph. d. RDF Schema enriches RDF by extending its vocabulary with classes and subclasses, properties and subproperties, and typing of properties.

b. RDF allows use of database-specific primary keys to identify resources.

In the case that an application needs to process large XML documents in a sequential way, it is recommended to use the... a. DOM API. b. SAX API.

b. SAX API.

Which statement is correct? a. A star schema has one large central dimension table which is connected to various smaller fact tables. b. The dimension tables of a star schema contain the criteria for aggregating the measurement data and will typically be used as constraints to answer queries. c. To speed up report generation and avoid time-consuming joins in a star schema, the dimension tables need to be normalized. d. The dimension tables in a star schema are frequently updated.

b. The dimension tables of a star schema contain the criteria for aggregating the measurement data and will typically be used as constraints to answer queries.

Which of the following statements about ODL is correct? a. ODL is only optimized for Java objects. b. The extent of a class is the set of all current instances. c. Many-to-many relationships cannot be expressed using ODL. d. Unary, binary, and ternary relationships are supported in ODL.

b. The extent of a class is the set of all current instances.

Which of the following is not an advantage of triggers? a. Triggers support automatic monitoring and verification in case of specific events or situations. b. Triggers allow avoidance of deadlock situations. c. Triggers allow modeling extra semantics and/or integrity rules without changing the user front-end or application code. d. Triggers allow performance of synchronic updates in case of data replication.

b. Triggers allow avoidance of deadlock situations.

In Java, what is method overloading? a. Putting so much code in a method that its functionality becomes hard to understand. b. Using two methods with the same name, but a different number (and/or different type) of arguments. c. Offering the user of your class all possible methods that he/she would like to perform on the variables the class offers. d. Making sure that every method uses all variables of the class.

b. Using two methods with the same name, but a different number (and/or different type) of arguments.

Pig is... a. a programming language that can be used to query HDFS data. b. a project offering a programming language to provide more user-friendliness compared to MapReduce programs. c. a database that runs on Hadoop. d. an SQL engine that runs on top of Hadoop.

b. a project offering a programming language to provide more user-friendliness compared to MapReduce programs.

Enterprise information integration (EII) is an example of... a. data consolidation. b. data integration. c. data propagation. d. data replication.

b. data integration.

The choreography pattern to manage sequence and data dependencies is a... a. centralized approach. b. decentralized approach.

b. decentralized approach.

Clustering, association rules, and sequence rules are examples of... a. predictive analytics. b. descriptive analytics.

b. descriptive analytics.

In industry, ORDBMSs have... a. been very successful since they replaced RDBMSs as the mainstream database technology. b. had modest success, with most companies only implementing a carefully selected set of extensions. c. not been successful at all.

b. had modest success, with most companies only implementing a carefully selected set of extensions.

Consider a dataset with a multiclass target variable as follows: 25% bad payers, 25% poor payers, 25% medium payers, and 25% good payers. In this case, the entropy will be... a. minimal. b. maximal.

b. maximal.

In an enterprise application integration (EAI) context, asynchronous communication between objects and/or applications can be achieved by means of... a. remote procedure call (RPC). b. message-oriented middleware (MOM).

b. message-oriented middleware (MOM).

Which statement is not correct? a. Query and reporting tools are an essential component of a comprehensive business intelligence solution. b. A pivot or cross-table is a popular data summarization tool. It essentially cross-tabulates a set of dimensions. c. A key disadvantage of OLAP is that it does not allow you to interactively analyze your data, summarize it, and visualize it in various ways. d. The key fundament of OLAP is a multidimensional data model which can be implemented in various ways.

c. A key disadvantage of OLAP is that it does not allow you to interactively analyze your data, summarize it, and visualize it in various ways.

The GIGO principle mainly relates to which aspect of the analytics process? a. Data selection. b. Data transformation. c. Data cleaning. d. All of the above.

c. Data cleaning. (she wants all of the above)

Which statement is correct? a. Dynamic binding means that objects are allowed to take the form of either the class they are an instance of, or any of its subclasses. b. In an inheritance structure with a parent class "Animal" and subclass "Chicken" at most one of these classes is allowed to have a method with the name "makeNoise". c. Different subclasses of a parent class can all have different implementations of methods with the same name, number of parameters, and parameter types. d. Static binding occurs at runtime whereas dynamic binding occurs at compile time.

c. Different subclasses of a parent class can all have different implementations of methods with the same name, number of parameters, and parameter types.

What does thefollowing Cypher query express? a. Return all of Bart's friends, and their friends as well. b. Do not return Bart's friends, but return their friends. c. Do not return Bart's friends, but return their friends if Bart does not know them. d. Return Bart's friends who have exactly one other friend.

c. Do not return Bart's friends, but return their friends if Bart does not know them.

Which of the following is correct? a. Document stores require users to define document schemas before data can be inserted. b. Document stores require that you perform all filtering and aggregation logic in your application. c. Document stores are built on the same ideas as key-value- and tuple-based database systems. d. Document stores do not provide SQL-like capabilities.

c. Document stores are built on the same ideas as key-value- and tuple-based database systems.

Which statement aboutETL isnot correct? a. Some estimates state that the ETL step can consume up to 80% of all efforts needed to set up a data warehouse. b. To decrease the burden on both the operational systems and the data warehouse itself, it is recommended to start the ETL process by dumping the data in a staging area where all the ETL activities can be executed. c. During the loading step, the data warehouse is populated by filling the fact and dimension tables, thereby also generating the necessary surrogate keys to link it all up. Fact rows should be inserted/updated before the dimension rows. d. The extraction strategy can be either full or incremental. In the latter case, only the changes since the previous extraction are considered.

c. During the loading step, the data warehouse is populated by filling the fact and dimension tables, thereby also generating the necessary surrogate keys to link it all up. Fact rows should be inserted/updated before the dimension rows.

Which statement about "Encapsulation" is correct? a. Encapsulation refers to storing a value in a variable, and never changing it again. This way its value is safe forever. b. Encapsulation refers to storing a value variable, and making it impossible to retrieve it. c. Encapsulation refers to controlling the way a variable is accessed by forcing users to use getter/setter methods that prevent misuse of the variable. d. Encapsulation implies that the methods of a class are not accessible to the other classes.

c. Encapsulation refers to controlling the way a variable is accessed by forcing users to use getter/setter methods that prevent misuse of the variable.

Which of the following measures cannot be used to make the splitting decision in a regression tree? a. Mean squared error (MSE). b. ANOVA/F-test. c. Entropy.

c. Entropy.

Which of the following statements is correct? a. DataNodes in HDFS store a registry of metadata. b. The HDFS NameNode sends regular heartbeat messages to its DataNodes. c. HDFS is composed of a NameNode, DataNodes, and an optional SecondaryNameNode. d. Both the SecondaryNameNode and primary NameNode can simultaneously handle requests from clients.

c. HDFS is composed of a NameNode, DataNodes, and an optional SecondaryNameNode.

What statement about SQL/XML is not correct? a. It introduces a new XML data type. b. It includes facilities for mapping relational data to XML. c. It includes rules for shredding XML data into SQL. d. The result of an SQL/XML query can be a combination of both relational and XML data types.

c. It includes rules for shredding XML data into SQL.

Using Cypher, how do you get a list of all movies Wilfried Lemahieu has liked, when he has given at least four stars? a. Select (b:User)--(m:Movie) Where b.name = "Wilfred Lemahieu" AND m.stars >= 4 b. Match (b:User)- [1:LIKES] - (m:Movie) Where b.name = "Wilfred Lemahieu" AND m.stars >= 4 Return m c. Match (b:User)- [1:LIKES] - (m:Movie) Where b.name = "Wilfred Lemahieu" AND 1.stars >= 4 RETURN m d. Select (b:User)--(m:Movie) Where b.name = "Wilfred Lemahieu" AND 1.stars >= 4 RETURN m

c. Match (b:User)- [1:LIKES] - (m:Movie) Where b.name = "Wilfred Lemahieu" AND 1.stars >= 4 RETURN m

Which of the following statements is correct? a. Missing values should always be replaced or removed. b. Outliers should always be replaced or removed. c. Missing values and outliers can potentially provide useful information and should be analyzed before they are removed/replaced. d. Missing values and outliers should both always be replaced or removed.

c. Missing values and outliers can potentially provide useful information and should be analyzed before they are removed/replaced.

Which of the following statements describes NoSQL databases best? a. A NoSQL database offers no support for SQL. b. NoSQL databases do not support joins. c. NoSQL databases are non-relational. d. NoSQL databases are not capable of dealing with large datasets.

c. NoSQL databases are non-relational.

Which of the following statements is not correct? a. RDDs allow for two forms of operations: transformations and actions. b. RDDs represent an abstract, immutable data structure. c. RDDs are structured and represent a collection of columnar objects. d. RDDs offer failure protection by tracking the lineage of operations that are applied on them.

c. RDDs are structured and represent a collection of columnar objects.

Which of the following statements is not correct? a. A mapper in Hadoop maps each element in a collection to one or more output elements. b. A reducer in Hadoop reduces a collection of elements to one or more output elements. c. Reducer workers in Hadoop will start once all mapper workers have finished. d. A MapReduce pipeline in Hadoop can include an optional Sorter to sort the final output.

c. Reducer workers in Hadoop will start once all mapper workers have finished.

Which of the following strategies can be used to deal with missing values? a. Keep. b. Delete. c. Replace/impute. d. All of the above.

c. Replace/impute.

What is not an advantage of OODBMSs? a. They allow storing objects and relationships in a transparent way. b. They solve the impedance mismatch problem by using the same data model as the programming language. c. Scalability and fault tolerance of OODBMSs is far better than that of their relational counterparts. d. The identity-based approach allows for improved performance when performing complex queries involving multiple interrelated objects, avoiding expensive joins.

c. Scalability and fault tolerance of OODBMSs is far better than that of their relational counterparts.

Which of the following schema-handling methods does Hive apply? a. Schema on write. b. Schema on load. c. Schema on read. d. Schema on query.

c. Schema on read.

Which statement is not correct? a. Analytics techniques are more and more used at the operational level as well by front-line employees. b. Analytics for tactical/strategic decision-making increasingly uses real-time operational data combined with the aggregated and historical data found in more traditional data warehouses. c. The operational usage of business intelligence aims for a low (or even zero) latency so interesting events or trends in the data can be immediately detected and accompanied with the appropriate response. d. Nowadays, we see a complete divergence of the operational and tactical/strategic data needs and of the corresponding data integration tooling.

c. The operational usage of business intelligence aims for a low (or even zero) latency so interesting events or trends in the data can be immediately detected and accompanied with the appropriate response.

Given the following five transactions: T1 {K, A, D, B} T2 {D, A, C, E, B} T3 {C, A, B, D} T4 {B, A, E} T5 {B, E, D}, consider the associationrule R: A ➔ BD. Which statement is correct? a. The support of R is 100% and the confidence is 75%. b. The support of R is 60% and the confidence is 100%. c. The support of R is 75% and the confidence is 60%. d. The support of R is 60% and the confidence is 75%.

c. The support of R is 75% and the confidence is 60%.

Which of the following statements is not correct? a. Velocity in Big Data refers to data "in movement". b. Volume in Big Data refers to data "at rest". c. Veracity in Big Data refers to data "in change". d. Variety in Big Data refers to data "in many forms".

c. Veracity in Big Data refers to data "in change".

When are column-oriented databases more efficient? a. When many columns of a single group need to be fetched at the same time. b. When inserts are performed where all of the row data are supplied at the same time. c. When aggregates need to be calculated over many or all rows in the dataset. d. When a lot of joins need to be performed in queries.

c. When aggregates need to be calculated over many or all rows in the dataset.

Which of the following protocols allows for an XML document to be transformed to another XML document? a. DTD b. HTML c. XSLT d. much more verbose.

c. XSLT

Ideally, data integration should include... a. only data. b. only processes. c. both processes and data.

c. both processes and data.

Process execution languages such as WS-BPEL aim at managing... a. only the control flow. b. only the data flow. c. both the control and data flow.

c. both the control and data flow.

Enterprise application integration (EAI) and enterprise data replication (EDR) are examples of... a. data consolidation. b. data federation. c. data propagation. d. data virtualization.

c. data propagation.

When compared against XML, both JSON and YAML are... a. not human readable. b. unable to provide support for ordered elements such as arrays. c. less technically mature. d. much more verbose.e support for ordered elements such as arrays. c. less technically mature. d. much more verbose.

c. less technically mature is the actual answer, but she wants a. not human readable.

Recursive queries are a powerful SQL extension which allow formulation of complex queries such as... a. queries that need to combine data from multiple tables. b. queries that need to get access to multimedia data. c. queries that need to navigate through a hierarchy of tuples. d. queries that have multiple subqueries.

c. queries that need to navigate through a hierarchy of tuples.

Which of the following statements is not correct? a. At the operational level, day-to-day business decisions are made, typically in real-time or with a short time frame. b. At the tactical level, decisions are made by middle management with a medium-term (e.g., a month, a quarter, a year) focus. c. At the strategic level, decisions are made by senior management with long-term implications (e.g., 1, 2, 5 years, or more) d. A data warehouse provides a centralized, consolidated data platform by integrating data from different sources and in different formats. As such, it provides a separate and dedicated environment for operational decision-making.

d. A data warehouse provides a centralized, consolidated data platform by integrating data from different sources and in different formats. As such, it provides a separate and dedicated environment for operational decision-making.

Which statement is not correct? a. A data mart is a scaled-down version of a data warehouse aimed at meeting the information needs of a homogeneous small group of end-users such as a department or business unit (e.g., marketing, finance, logistics, HR, etc.). b. Dependent data marts pull their data from a central data warehouse, whereas independent data marts are standalone systems drawing data directly from the operational systems, external sources, or a combination of both. c. A virtual data warehouse (sometimes also called a federated database) or virtual data mart contains no physical data but provides a uniform and consolidated single point of access to a set of underlying physical data stores. d. A key advantage of virtualization is that it requires no extra processing capacity from the underlying (operational) data stores.

d. A key advantage of virtualization is that it requires no extra processing capacity from the underlying (operational) data stores.

Which of the following are properties of SPARQL? a. It is based upon matching graph patterns. b. It can query RDF graphs. c. It provides support for namespaces. d. All of the above.

d. All of the above.(question might read NOT PROPERTIES, but the answer is the same)

Which of the following is not an example of a NoSQL database? a. Graph-based databases. b. XML-based databases. c. Document-based databases. d. All three can be regarded as NoSQL databases.

d. All three can be regarded as NoSQL databases. (FEDERATED)

Which of thefollowing is not one of the reasons why Spark programs are generally faster than MapReduce operations? a. Because Spark tries to keep its RDDs in memory as long as possible. b. Because Spark uses a directed acyclic graph instead of MapReduce. c. Because RDD transformationsal are "lazily" applied. d. Because Mesos can be used as a resource manager instead of YARN.

d. Because Mesos can be used as a resource manager instead of YARN.

Decision trees can be used in the following applications: a. Credit risk scoring. b. Credit risk scoring and churn prediction. c. Credit risk scoring, churn prediction, and customer profile segmentation. d. Credit risk scoring, churn prediction, customer profile segmentation, and market basket analysis.

d. Credit risk scoring, churn prediction, customer profile segmentation, and market basket analysis.

Which of the following statements is not correct? a. Hive offers an SQL engine to query Hadoop data. b. Hive's query language is not as feature-complete as the full SQL standard. c. Hive offers a JDBC interface. d. Hive queries run much faster than hand-written MapReduce programs.

d. Hive queries run much faster than hand-written MapReduce programs.

Which statement is not correct? a. Process integration is to integrate and harmonize the various business processes in an organization as much as possible. b. The control flow perspective of a business process specifies the correct sequencing of tasks (e.g., a loan offer can only be made when the credit score has been calculated). c. The data flow perspective of a business process focuses on the inputs of the tasks (e.g., the interest rate offered depends on the credit score). d. In a service-oriented context, there is a tendency to physically integrate services with the purpose of task coordination with services that perform the actual task execution and services that provide access to the necessary data.

d. In a service-oriented context, there is a tendency to physically integrate services with the purpose of task coordination with services that perform the actual task execution and services that provide access to the necessary data.

Which of thefollowing statements about XML Schema is not correct? a. It is more verbose than DTD. b. It allows specification of minimum and maximum cardinalities. c. Various data types are supported such as xs:string, xs:short, xs:byte, etc. d. It is not defined using XML syntax.

d. It is not defined using XML syntax.

Which statement is not correct? a. Multidimensional OLAP (MOLAP) stores the multidimensional data using a multidimensional DBMS (MDBMS) whereby the data are stored in a multidimensional arraybased data structure optimized for efficient storage and quick access. b. Relational OLAP (ROLAP) stores the data in a relational data warehouse, which can be implemented using a star, snowflake, or fact constellation schema. c. Hybrid OLAP (HOLAP) tries to combine the best of both MOLAP and ROLAP. An RDBMS can then be used to store the detailed data in a relational data warehouse, whereas the precomputed aggregated data can be kept as a multidimensional array managed by an MDBMS. d. MOLAP scales better to more dimensions than ROLAP. The query performance may, however, be inferior to ROLAP unless some of the queries are materialized or highperformance indexes are defined.

d. MOLAP scales better to more dimensions than ROLAP. The query performance may, however, be inferior to ROLAP unless some of the queries are materialized or highperformance indexes are defined.

A key benefit of REST when compared to SOAP for web services is that... a. REST has an official standard. b. REST only allows XML for exchanging requests and responses. c. REST is communication agnostic, whereas SOAP is tightly integrated with HTTP. d. REST is built directly on top of HTTP and is less verbose and heavy than SOAP.

d. REST is built directly on top of HTTP and is less verbose and heavy than SOAP.

Which of the following statements is not correct? a. Apart from handling MapReduce programs, YARN can also be used to manage other types of applications. b. YARN's JobHistoryServer keeps a log of all finished jobs. c. NodeManagers in YARN are responsible for setting up containers on the node hosting a particular (sub) task. d. The YARN ApplicationMaster contains a scheduler which will hold submitted jobs in a queue until they are deemed ready to start.

d. The YARN ApplicationMaster contains a scheduler which will hold submitted jobs in a queue until they are deemed ready to start.

Which of the following is not a property of a good hash function for use in key-value-based storage structures? a. A hash function should always return the same output for the same input. b. A hash function should return an output of fixed size. c. A good hash function should map its inputs as evenly as possible over the output range. d. Two hashes from two inputs that differ little should also differ as little as possible.

d. Two hashes from two inputs that differ little should also differ as little as possible.

Which of the following is not a characteristic of a data warehouse? a. Subject-oriented. b. Integrated. c. Time-variant. d. Volatile.

d. Volatile.

An ORDBMS will typically support inheritance... a. only at tuple level. b. only at data type level. c. only at table type level. d. at both data type and table type level.

d. at both data type and table type level.

Which of the following is correct? a. A distinct data type is a user-defined data type which specializes a standard, built-in SQL data type. b. An opaque data type is an entirely new, user-defined data type, which is not based upon any existing SQL data type. c. An unnamed row type allows inclusion of a composite data type in a table by using the keyword ROW. d. A named row type is a user-defined data type that groups a coherent set of data types into a new composite data type and assigns a meaningful name to it. e. All of the above are correct.

e. All of the above are correct.

Which statement is correct? a. Roll-up (or drill-up) refers to aggregating the current set of fact values within or across one or more dimensions. b. Roll-down (or drill-down) de-aggregates the data by navigating from a lower level of detail to a higher level of detail. c. Slicing represents the operation whereby one of the dimensions is set at a particular value. d. Dicing corresponds to a range selection on one or more dimensions. e. All of the above are correct.

e. All of the above are correct.


Set pelajaran terkait

HL Electromagnetic Induction + SL Electricity (units 11 + 5)

View Set

chapter 14 - essential characteristics of contract law

View Set

Ch. 5 Your Personal Environment, Time, and MoneyVocabulary

View Set

Floor Hockey, Floor Hockey, Study Guide for Volleyball, Volleyball quiz 2016, Volleyball Unit, Volleyball: Common Faults, Volleyball, Volleyball, Golf study guide, Phys Ed 2 Golf vocabulary, Phy. Ed. Golf Quiz, Tennis Vocab, Phys. Ed. EC-12 Tennis Te...

View Set

BIO102: Chapter 27: Archaea and Bacteria

View Set