CPTR 424 Review 3

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

_______ is the problem of placing instances into the correct one of several possible categories.

Classification

A column-oriented table system as in HBase is exactly the same as a relational table in the relational model with a flexible way to add new columns.

False

Atomicity in HBase supports multi-row updates as in the relational model.

False

Big data refers to massively large, structured data sets with well-defined schemas.

False

Cell values and their timestamps in HBase are always stored separately from their row keys, column family names, and column qualifier names.

False

Column families in HBase can be defined dynamically after a table is created.

False

HTML has emerged as a robust standard for document storage, exchange, and retrieval that facilitates the development of online applications.

False

Histograms are rules of thumb that can be used to help solve problems.

False

Hive is a fully functional relational database built on top of Hadoop.

False

HiveQL is a real-time, SQL-like query language for large datasets in Hive.

False

A major area of disagreement is how _______ property laws should be applied to software, or whether they should apply at all to digital media.

Intellectual

In HDFS, distributing the blocks of a file across multiple computers supports __________, thus leading to faster computational capabilities.

Parallel processing

A data warehouse is optimized for data _______ as opposed to processing transactions.

Retrieval

Data intensive applications are more concerned about __________ and __________ than data consistency as provided by the ACID (atomicity, consistency, isolation, and durability) properties of relational database systems.

Scalability, performance

An XML _______ defines the organization and data types of an XML structure.

Schema

HTTP is a _______ protocol with no facility for remembering previous interactions, which creates a problem for e-commerce.

Stateless

A key-value pair system is similar to the functionality of a hash table.

True

In a document-oriented system, a document is similar to a row in a relational table, having a common structure with the ability to vary the types of fields in the document.

True

Most businesses are in the early stages of discovering how to extract meaningful knowledge from big data.

True

NewSQL systems are also beginning to appear on the market as alternatives to RDBMS and NoSQL systems.

True

SAP Infinite Insight can access data from any ODBC-compliant database.

True

The World Wide Web was proposed by A. Tim Berners-Lee B. DARPA C. Bell Labs D. E.F. Codd

A. Tim Berners-Lee

The Internet developed from an older communications network called A. DARPA B. TCP/IP C. Arpanet D. Sabre

C. Arpanet

The European Union privacy laws are based on A. the UN Charter B. the Paris Convention C. OECD guidelines D. the Madrid Protocol

C. OECD guidelines

The datatype provided by Oracle that allows users to create a table containing XML objects is A. XML Raw B. XML Auto C. XMLType D. UML

C. XMLType

HTTP is an example of A. a URL B. hypertext C. a communications protocol D. a markup language

C. a communications protocol

If the values of an attribute, A, are uniformly distributed in a table, it means A. each tuple has a different value of A B. the tuples are arranged in increasing order by value of A C. for each value of A, there are the same number of tuples D. the values of A form a normal distribution

C. for each value of A, there are the same number of tuples

Columns in HBase are known as __________.

Column qualifiers

In Oracle, the function that is used with XML object tables and returns values without tags is A. SHRED B. RAW C. AUTO D. EXTRACTVALUE

D. EXTRACTVALUE

In contrast to data warehouses, operational databases support what type of processing? A. DSS B. data mining C. OLAP D. OLTP

D. OLTP

When performing a join, the choice of method depends only on the size of the files.

False

In a relational database, data is _______ to reduce redundancy, which promotes data integrity.

Normalized

The number of column qualifiers in a column family can be defined dynamically after a table is created.

True

There are two competing methodologies for designing data warehouses\: top-down and bottom-up.

True

Data warehouse appliances typically include servers, operating systems, and DBMSs already installed and optimized for data warehouse creation and processing

True

For XML to be used in an application, it is necessary to parse XML documents.

True

NoSQL systems were designed to provide real-time access to big data stores as well as row-level inserts, updates, and deletes on data that may not have a well-defined schema.

True

Standard Generalized Markup Language (SGML) is a meta-language that allows users to define their own markup languages.

True

Which of the following organizations has published a Code of Ethics and Professional Practice for Computing Professionals? A. ACM B. WIPO C. The European Union D. The United Nations

A. ACM

Codes of conduct for software professionals are published by A. ACM and IEEE B. the US government C. the United Nations D. all of these

A. ACM and IEEE

The term coined by Howard Dresner to mean using fact-based support systems to improve business decision-making is A. Business Intelligence B. Data mining C. Data warehousing D. Decision Support Systems

A. Business Intelligence

__________ refers to inexpensive computers that are widely available and can easily be combined to form parallel computing clusters.

Commodity harware

Two important measures connected with association rules are support and ___________.

Confidence

If a query involves multiple selection criteria, we would use a __________ condition.

Conjunctive

The first SQL version to include OLAP functions was A. SQL1 B. SQL1-89 revision C. SQL2 in1992 D. SQL\:1999

D. SQL\:1999

In data mining, a decision tree is a technique for developing A. association rules B. sequential patterns C. neural networks D. classification rules

D. classification rules

Impression is a statistical method for predicting the value of an attribute, Y, (the dependent variable), given the values of attributes X 1, X 2, . . ., X n (the independent variables).

False

Many organizations started to use standard database technology to collect, store, and process massive amounts of their operational data in the 1950s.

False

Unlike an operational database, for which requirements can be specified in advance, a data warehouse does not support ad hoc queries.

False

An HDFS cluster is composed of __________ and __________.

Nodes, racks

The term __________ is a characteristic of a system that can continue to operate even in the presence of network partitions.

Partition tolerance

A technique that is often used to represent relational algebra expressions is a query _______, which is a graphical representation of the operations and operands in a relational algebra expression.

Tree

A partition in Hive provides a way to organize a table according to a specific column value to provide faster access to a sub-portion of the data

True

In privacy legislation, an 'Opt-out' provision means A. data can be shared unless the individual requests not to B. data cannot be shared unless the individual chooses to C. data can never be shared under any circumstances D. data can be shared regardless of the individual's wishes

A. data can be shared unless the individual requests not to

The process of checking data for validity and integrity before placing it in a data warehouse is called A. data cleaning B. reformatting C. data integration D. viewing

A. data cleaning

Data cubes of dimension higher than 3 are called A. hypercubes B. networks C. stars D. dimension tables

A. hypercubes

The type of reasoning used for data mining is A. induction B. deduction C. generalization D. abstraction

A. induction

An index in which, for each value of an attribute, the IDs of the tuples having that value are stored, is a A. join index B. dense index C. hierarchical index D. B+ tree index

A. join index

The default method of doing joins is A. nested loops B. sort-merge C. using an index D. using a has key

A. nested loops

For a work to be copyrighted, the US copyright law requires all of the following EXCEPT A. publication B. originality C. fixation D. expression

A. publication

An XML instance document that conforms to its schema is called A. schema-valid B. well formed C. type-valid D. document-valid

A. schema-valid

The Oracle cost-based optimizer stores all of the following statistics for a table EXCEPT A. the number of users authorized to read the table B. the number of rows in the table C. the average length of a row D. the amount of unused space per block

A. the number of users authorized to read the table

A __________ is a type of NoSQL system where a key serves as an index to find an associated value.

Key-value pair

The term __________ refers to the delay between the time a query is submitted and the time the results are returned.

Latency

_______ refers to the ease with which users can accomplish tasks the first time they see the design.

Learnability

When a query is written on a view, one way it can be executed is through query ___________, which replaces the reference in the WHERE clause by the view definition.

Modification

A __________ occurs when a sub-network becomes separate from the rest of the network due to a node failure.

Network partition

Which of the following relational algebra projections will normally have the lowest cost? A. the projection list contains a primary key B. the projection list contains a secondary key C. the projection list contains an attribute on which the file is sorted D. the projection list has only one attribute

A. the projection list contains a primary key

In an XML instance document, the element that contains all others is called A. the root B. an attribute C. an entity D. a tag

A. the root

The Oracle package that allows programmers to embed PL/SQL statements directly in HTML Sections is A. PSP B. PL/SQL Web Toolkit C. XML D. PHP

A. PSP

Which of the following is not a data warehouse modeling system? A. Secured Online Analytical Processing (SOLAP) B. Multidimensional Online Analytical Processing (MOLAP) C. Relational Online Analytical Processing (ROLAP) D. Hybrid Online Analytical Processing (HOLAP)

A. Secured Online Analytical Processing (SOLAP)

Privacy as 'the right to be left alone' was defined by A. Sir William Blackstone B. Louis Brandeis C. the US Constitution D. Sir David Calcutt

A. Sir William Blackstone

In Oracle, the user can force the system to use a particular index by including in the SQL statement A. a hint B. an ORDER BY C. a join query D. an EXPLAIN PLAN

A. a hint

In data mining, the items in a customer transaction are referred to as A. a market basket B. a market analysis C. time series D. a sale inventory

A. a market basket

The Internet developed from _______, a communications network that was created in the 1960s.

Arpanet

XML Elements can have _______ whose names and values are shown inside the element's start tag.

Attributes

_______ offers Enterprise Miner, which includes tools for the entire data mining process, from data preparation to scoring. A. Oracle B. SAS C. IBM D. Microsoft

B. SAS

All of the following are factors considered in costing out a query execution plan EXCEPT A. processing time B. SQL translation time C. number of disk accesses D. amount of memory needed

B. SQL translation time

The World Wide Web proposal introduced all of these technologies EXCEPT A. URLs B. TCP/IP C. HTTP D. HTML

B. TCP/IP

In data mining, rules in which a first event implies the second, where both occur at the same time are called A. classification rules B. association rules C. time series patterns D. sequential rules

B. association rules

The cheapest possible join of tables A and B has cost A. b(A + (b(A) * b(B)) B. b(A) + b(B) C. b(A+B) D. log2 (b(A) + b(B))

B. b(A) + b(B)

On the client side, a common solution to the problem of maintaining state during a transaction is by using A. a single communications session B. cookies C. servlets D. entities

B. cookies

A multidimensional matrix data model used in data warehouses is called a A. ROLAP system B. data cube C. data network D. data tree

B. data cube

If a table is stored in packed form, it means A. each page is completely full B. each page contains only records of that table C. pages can contain records from several tables D. records are arranged in pages in order by primary key

B. each page contains only records of that table

To be eligible for copyright protection, an intellectual work must meet conditions of A. non-obviousness, fixation, and utility B. originality, expression, and fixation C. non-obviousness, expression, and utility D. originality, performance, and fixation

B. originality, expression, and fixation

A stateless protocol is one in which A. no single server is used B. previous interactions are not remembered C. the government does not receive messages D. all messages are part of a long session

B. previous interactions are not remembered

In a FLWOR expression, the R stands for A. result B. return C. retrieve D. resource

B. return

Combining or aggregating data in a multidimensional data model is called A. pivoting B. rollup C. drill-down D. slicing

B. rollup

Instructing a Web browser on how to present data is performed by A. XML B. style sheets C. scripting languages D. cookies

B. style sheets

Which aspect(s) of software can be copyrighted? A. the algorithm but not the program B. the program but not the algorithm C. both the algorithm and the program D. the algorithm and the documentation

B. the program but not the algorithm

The process of checking a query to verify that the objects referred to in the query are actual database objects is called A. syntax checking B. validation C. translation D. optimization

B. validation

Privacy is not a concern for big data collection.

False

Table design in HBase is similar to table design in the relational model.

False

The set operations of intersection and difference cannot be done on files that are union-compatible, having identical structures.

False

The terms Internet and World Wide Web are synonymous.

False

The traditional aggregate functions of SUM, COUNT, MAX, MIN, and AVG cannot be used in queries for a data warehouse.

False

There are 5 steps in the data mining process adapted from a model called CRISP-DM\: Business Understanding, Data Understanding, Data Preparation, Modeling, and Deployment.

False

Unlike the Internet, the World Wide Web is a strictly organized information resource whose potential is limited.

False

The two main components of Hadoop are __________ and __________.

HDFS, MapReduce parallel programming paradign

__________ is generally considered to be the system that initiated the era of big data storage and analytics.

Hadoop

NoSQL systems use __________ to distribute and replicate data over commodity hardware.

Horizontal scaling

An alternative method of handling views is to materialize them, pre-computing them from the definition and storing them for later use.

True

Column-oriented systems are based on concepts as originally defined in Google's BigTable system.

True

Data mining can model scenarios to help determine the best placement of equipment.

True

The Hive data warehouse system for Hadoop operates in schema-on-read mode.

True

The advantage of a distributed file system is that that it is capable of representing data sets that are too large to fit within the storage capacity of a single machine by distributing data across a network of machines.

True

Knowing how to extract __________ from big data is one of the most important aspects to collecting and analyzing big data.

Value

The law that protects patients from disclosure of their health records is A. HIPPA B. Gramm Leach Bliley Act C. the Lanham Act D. the Freedom of Information Act

A. HIPPA

Which of the following is not something organizations hope to accomplish through data mining? A. Minimize profits B. Predict future behavior C. Classify items by placing them in the correct categories D. Identify the existence of an activity or an event

A. Minimize profits

Segments of a data warehouse devoted to specific subject matter are called A. DSSs B. OLAPs C. data marts D. data mines

C. data marts

To file for a patent, an inventor must A. keep the formula or plans for the device secret B. publish the formula or plans publicly C. describe the device in detail in the application D. submit an idea for a device

C. describe the device in detail in the application

A dense index is one that A. is clustered B. has an entry for each value of the indexed attribute C. has an entry for each tuple D. is on a secondary key

C. has an entry for each tuple

Some systems, including Oracle, store more information about the distribution of values, maintaining _______, which are graphs that display the frequencies of different values of attributes. A. hierarchy charts B. heuristics C. histograms D. projections

C. histograms

Drill-down of a data cube means A. reducing the dimension of the cube B. selecting only certain values of some attribute(s) C. providing more detail on some dimension D. showing a different dimension of interest

C. providing more detail on some dimension

The Advance Passenger Information Act allowed the disclosure of information about airline passengers to A. the FBI B. the CIA C. the Office of Homeland Security D. the FAA

C. the Office of Homeland Security

For projection, the most significant cost factor is usually A. the cost of eliminating the attributes not on the projection list B. the cost of eliminating the tuples that have null values for the attributes on the projection list C. the cost of eliminating duplicates D. the cost of writing the results

C. the cost of eliminating duplicates

The selection size of an attribute is A. the number of values of the attribute B. the number of tuples in the relation C. the number of tuples expected to have a specific value for the attribute D. the maximum number of tuples that can have a specific value for the attribute

C. the number of tuples expected to have a specific value for the attribute

In a data warehouse, the data can be checked for integrity and validity, a process called data _______, to ensure its quality prior to loading it into the warehouse.

Cleaning

The use of cookies and spyware to collect _______ data, which is a record of every keystroke on a user's computer, and the increasing use of data mining can further erode privacy.

Clickstream

__________ in HBase provide a way to conceptually organize columns that have the same access patterns into groups.

Column families

Oracle estimates frequencies of column values using A. a uniform distribution assumption B. a normal distribution C. regression analysis D. histograms

D. histograms

All of the following are methods for protecting intellectual property EXCEPT A. trademarks B. patents C. copyright D. interfaces

D. interfaces

In a tree representing a semi-structured data model, all data resides in the A. root B. attributes C. edges D. leaves

D. leaves

Constructing a temporary table to use for the next operation is called A. pipelining B. indexing C. pass through D. materialization

D. materialization

In data mining, overfitting the curve is a flaw that occurs with A. regression B. time series patterns C. classification rules D. neural networks

D. neural networks

A star schema consists of A. many fact tables and many dimension tables B. many fact tables and one dimension table C. one fact table and one dimension table D. one fact table and many dimension tables

D. one fact table and many dimension tables

Materializing views means A. making new base tables B. updating the views C. making views dynamic D. precomputing them and storing them

D. precomputing them and storing them

The ability to do _______ between XML and databases means being able to accept data in XML form, transform it into a relational database, query and update the database in the usual way, access it using SQL, and transform the output back into XML format, without loss of content. A. pipelining B. SGML C. transforming D. round-tripping

D. round-tripping

Most copyright laws are based on A. a United Nations declaration B. OECD guidelines C. the Madrid Protocol D. the Berne Convention

D. the Berne Convention

Which of the following is true about successive relational algebra project operations? A. they commute B. they can be replaced by selects C. they always distribute over joins D. they can be reduced to the final project

D. they can be reduced to the final project

In the US, the Lanham Act protects A. patents B. copyrights C. trade secrets D. trademarks

D. trademarks

The Madrid Protocol is designed to protect A. patents B. copyrights C. trade secrets D. trademarks

D. trademarks

An instance document that obeys the rules of the XML language is called A. type valid B. schema valid C. integral D. well formed

D. well formed

A __________ is a type of NoSQL system where an individual record is referred to as a document that can be encoded in a format such as XML or JavaScript Object Notation.

Document-oriented system


Kaugnay na mga set ng pag-aaral

SS.7.C.1.4: Natural Rights and Declaration of Independence

View Set

Fluid and Electrolyte/Thermoregulation

View Set

LET Specialization: Social Studies

View Set