MSS Final

Ace your homework & exams now with Quizwiz!

32) What does Web content mining involve? A) analyzing the universal resource locator in Web pages B) analyzing the unstructured content of Web pages C) analyzing the pattern of visits to a Web site D) analyzing the PageRank and other metadata of a Web page

b

A large storage location that can hold vast quantities of data (mostly unstructured) in its native/raw format for future/potential analytics consumption is referred to as a(n) A) extended ASP. B) data cloud. C) data lake. D) relational database.

c

Data warehouses are subsets of data marts.

f

Because of its successful application to retail business problems, association rule mining is commonly called ________.

market-basket analysis

Articles and auxiliary verbs are assigned little value in text mining and are usually filtered out.

t

What is search engine optimization (SEO) and why is it important for organizations that own Web sites?

Search engine optimization (SEO) is the intentional activity of affecting the visibility of an e-commerce site or a Web site in a search engine's natural (unpaid or organic) search results. In general, the higher ranked on the search results page, and more frequently a site appears in the search results list, the more visitors it will receive from the search engine's users.

Describe cluster analysis and some of its applications.

Cluster analysis is an exploratory data analysis tool for solving classification problems.

________ is a segmentation metric for social networks that measures the strength of the bonds between actors in a social network

Cohesion

knowledge

Concepts and relationships inside an individual's mind that enables the human to identify matching patterns in what he or she perceives

At a very high level, the text mining process can be broken down into three consecutive tasks, the first of which is to establish the ________.

Corpus

45) ________ was proposed in the mid-1990s by a European consortium of companies to serve as a nonproprietary standard methodology for data mining.

Answer: CRISP-DM

What is the definition of a data warehouse (DW) in simple terms?

Answer: In simple terms, a data warehouse (DW) is a pool of data produced to support decision making; it is also a repository of current and historical data of potential interest to managers throughout the organization.

Knowledge extraction, pattern analysis, data archaeology, information harvesting, pattern searching, and data dredging are all alternative names for ________.

Answer: data mining

How would you describe information extraction in text mining?

Information extraction is the identification of key phrases and relationships within text by looking for predefined objects and sequences in text by way of pattern matching.

Natural language processing (NLP), a subfield of artificial intelligence and computational linguistics, is an important component of text mining. What is the definition of NLP?

NLP is a discipline that studies the problem of "understanding" the natural human language, with the view of converting depictions of human language into more formal representations in the form of numeric and symbolic data that are easier for computer programs to manipulate.

________, also called homonyms, are syntactically identical words with different meanings.

Polysemes

________ is a connections metric for social networks that measures the ties that actors in a network have with others that are geographically close.

Propinquity

________, or "The Extended ASP Model," is a creative way of deploying information system applications where the provider licenses its applications to customers for use as a service on demand (usually over the Internet).

SaaS (software as a service)

________ is a technique used to detect favorable and unfavorable opinions toward specific products and services using large numbers of textual data sources.

Sentiment analysis

The hub-and-spoke data warehouse model uses a centralized warehouse feeding dependent data marts.

TRUE

25) The data field "ethnic group" can be best described as A) nominal data. B) interval data. C) ordinal data. D) ratio data.

a

All of the following statements about data mining are true EXCEPT A) the process aspect means that data mining should be a one-step process to results. B) the novel aspect means that previously unknown patterns are discovered. C) the potentially useful aspect means that results should lead to some business benefit. D) the valid aspect means that the discovered patterns should hold true on new data.

a

Clustering partitions a collection of things into segments whose members share A) similar characteristics. B) dissimilar characteristics. C) similar collection methods. D) dissimilar collection methods.

a

Identifying and preventing incorrect claim payments and fraudulent activities falls under which type of data mining applications? A) insurance B) retailing and logistics C) customer relationship management D) computer hardware and software

a

In data mining, finding an affinity of two products to be commonly together in a shopping cart is known as A) association rule mining. B) cluster analysis. C) decision trees. D) artificial neural networks

a

In estimating the accuracy of data mining (or other) classification models, the true positive rate is A) the ratio of correctly classified positives divided by the total positive count. B) the ratio of correctly classified negatives divided by the total negative count. C) the ratio of correctly classified positives divided by the sum of correctly classified positives and incorrectly classified positives. D) the ratio of correctly classified positives divided by the sum of correctly classified positives and incorrectly classified negatives.

a

In text analysis, what is a lexicon? A) a catalog of words, their synonyms, and their meanings B) a catalog of customers, their words, and phrases C) a catalog of letters, words, phrases, and sentences D) a catalog of customers, products, words, and phrases

a

In text mining, tokenizing is the process of A) categorizing a block of text in a sentence. B) reducing multiple words to their base or root. C) transforming the term-by-document matrix to a manageable size. D) creating new branches or stems of recorded paragraphs.

a

Operational or transaction databases are product oriented, handling transactions that update the database. In contrast, data warehouses are A) subject-oriented and nonvolatile. B) product-oriented and nonvolatile. C) product-oriented and volatile. D) subject-oriented and volatile.

a

When representing data in a data warehouse, using several dimension tables that are each connected only to a fact table means you are using which warehouse structure? A) star schema B) snowflake schema C) relational schema D) dimensional schema

a

_______ is an evolving tool space that promises real-time data integration from a variety of sources, such as relational databases, Web services, and multidimensional databases. A) Enterprise information integration (EII) B) Enterprise application integration (EAI) C) Extraction, transformation, and load (ETL) D) None of these

a

List and briefly describe the six steps of the CRISP-DM data mining process

Business Understanding- Data Understanding- Data Preparation- Model Building- Testing Evaluation- Deploy-

What does the scalability of a data mining method refer to? A) its ability to predict the outcome of a previously unknown data set accurately B) its speed of computation and computational costs in using the mode C) its ability to construct a prediction model efficiently given a large amount of data D) its ability to overcome noisy data to make somewhat accurate predictions

c

What is the main reason parallel processing is sometimes used for data mining? A) because the hardware exists in most organizations, and it is available to use B) because most of the algorithms used for data mining require it C) because of the massive data amounts and search efforts involved D) because any strategic application requires parallel processing

c

Which data warehouse architecture uses a normalized relational warehouse that feeds multiple data marts? A) independent data marts architecture B) centralized data warehouse architecture C) hub-and-spoke data warehouse architecture D) federated architecture

c

Which kind of data warehouse is created separately from the enterprise data warehouse by a department and not reliant on it for updates? A) sectional data mart B) public data mart C) independent data mart D) volatile data mart

c

Understanding which keywords your users enter to reach your Web site through a search engine can help you understand A) the hardware your Web site is running on. B) the type of Web browser being used by your Web site visitors. C) most of your Web site visitors' wants and needs. D) how well visitors understand your products.

d

When querying a dimensional database, a user went from summarized data to its underlying details. The function that served this purpose is A) dice. B) slice. C) roll-up. D) drill down.

d

Which data mining process/methodology is thought to be the most comprehensive, according to kdnuggets.com rankings? A) SEMMA B) proprietary organizational methodologies C) KDD Process D) CRISP-DM

d

Which data warehouse architecture uses metadata from existing data warehouses to create a hybrid logical data warehouse comprised of data from the other warehouses?

d

Why is a performance management system superior to a performance measurement system? A) because performance measurement systems are only in their infancy B) because measurement automatically leads to problem solution C) because performance management systems cost more D) because measurement alone has little use without action

d

Data preparation, the third step in the CRISP-DM data mining process, is more commonly known as ________.

data preprocessing

The three main types of data warehouses are data marts, operational ________, and enterprise data warehouses.

data stores

One way to accomplish privacy and protection of individuals' rights when data mining is by ________ of the customer records prior to applying data mining applications, so that the records cannot be traced to an individual.

de-identifcation

Performing extensive ________ to move data to the data warehouse may be a sign of poorly managed data and a fundamental lack of a coherent data management strategy.

extraction, transformation, and load (ETL)

Because the recession has raised interest in low-cost open source software, it is now set to replace traditional enterprise software.

f

Bill Inmon advocates the data mart bus architecture whereas Ralph Kimball promotes the hub-and-spoke architecture, a data mart bus architecture with conformed dimensions.

f

Companies understand that when their product goes "viral," the content of the online conversations about their product does not matter, only the volume of conversations.

f

Data mining can be very useful in detecting patterns such as credit card fraud, but is of little help in improving sales.

f

Data that is collected, stored, and analyzed in data mining is often private and personal. There is no way to maintain individuals' privacy other than being very careful about physical data security.

f

Data warehouse administrators (DWAs) do not need strong business insight since they only handle the technical aspect of the infrastructure.

f

In sentiment analysis, sentiment suggests a transient, temporary opinion reflective of one's feelings.

f

Market basket analysis is a useful and entertaining way to explain data mining to a technologically less savvy audience, but it has little business significance.

f

Moving the data into a data warehouse is usually the easiest part of its creation.

f

OLTP systems are designed to handle ad hoc analysis and complex queries that deal with many data items.

f

Organizations seldom devote a lot of effort to creating metadata because it is not important for the effective use of data warehouses.

f

Properly integrating data from various databases and other disparate sources is a trivial process.

f

Search engine optimization (SEO) techniques play a minor role in a Web site's search ranking because only well-written content matters.

f

Search engines are only used in the context of the World Wide Web (WWW).

f

Statistics and data mining both look for data sets that are as large as possible.

f

Subject oriented databases for data warehousing are organized by detailed subjects such as disk drives, computers, and networks.

f

Two-tier data warehouse/BI infrastructures offer organizations more flexibility but cost more than three-tier ones.

f

User-initiated navigation of data through disaggregation is referred to as "drill up."

f

With the balanced scorecard approach, the entire focus is on measuring and managing specific financial goals based on the organization's strategy.

f

A(n) ________ is one or more Web pages that provide a collection of links to authoritative Web pages.

hub

Web pages contain both unstructured data and ________, which are connections to other Web pages.

hyperlinks

In ________, a classification method, the complete data set is randomly split into mutually exclusive subsets of approximately equal size and tested multiple times on each left-out subset, using the others as a training set.

k-fold cross-validation

________ describe the structure and meaning of the data, contributing to their effective use.

metadata

A(n) ________ data store (ODS) provides a fairly recent form of customer information file.

operational

data

raw facts that are perceivable

A(n) ________ Web site contains links that send traffic directly to your Web site.

referral

Most data warehouses are built using ________ database management systems to control and manage the data.

relational

50) Customer ________ management extends traditional marketing by creating one-on-one relationships with customers.

relationship

Given that the size of data warehouses is expanding at an exponential rate, ________ is an important issue.

scalability

A(n) ________ engine is a software program that searches for Web sites or files based on keywords.

search

) In sentiment analysis, which of the following is an implicit opinion? A) The hotel we stayed in was terrible. B) The customer service I got for my TV was laughable. C) The cruise we went on last summer was a disaster. D) Our new mayor is great for the city.

b

Because of performance and data quality issues, most experts agree that the federated architecture should supplement data warehouses, not replace them.

t

Current use of sentiment analysis in voice of the customer applications allows companies to change their products or services in real time in response to customer sentiment.

t

During classification in data mining, a false positive is an occurrence classified as true by the algorithm while being false in reality.

t

In data mining, classification models help in prediction.

t

In sentiment analysis, it is hard to classify some subjects such as news as good or bad, but easier to classify others, e.g., movie reviews, in the same way.

t

In text mining, if an association between two concepts has 7% support, it means that 7% of the documents had both concepts represented in the same document.

t

One way an operational data store differs from a data warehouse is the recency of their data

t

Regional accents present challenges for natural language processing.

t

The "islands of data" problem in the 1980s describes the phenomenon of unconnected data being stored in numerous locations within an organization.

t

The cost of data storage has plummeted recently, making data mining feasible for more firms.

t

The data warehousing maturity model consists of six stages: prenatal, infant, child, teenager, adult, and sage

t

With key performance indicators, driver KPIs have a significant effect on outcome KPIs, but the reverse is not necessarily true.

t

Without middleware, different BI programs cannot easily connect to the data warehouse.

t

In what ways does the Web pose great challenges for effective and efficient knowledge discovery through data mining?

too big, too complex, too dynamic, not a specific domain, has everything

Online ________ is a term used for a transaction system that is primarily responsible for capturing and storing data related to day-to-day business functions such as ERP, CRM, SCM, and point of sale.

transaction processing

What is the difference between white hat and black hat SEO activities?

white hat- confroms to search engine guidlines Black hat - attempt to improve rankings in ways that they are disapproved by search engine

When a word has more than one meaning, selecting the meaning that makes the most sense can only be accomplished by taking into account the context within which the word is used. This concept is known as ________.

word sense disambiguation

How does the use of cloud computing affect the scalability of a data warehouse? A) Cloud computing vendors bring as much hardware as needed to users' offices. B) Hardware resources are dynamically allocated as use increases. C) Cloud vendors are mostly based overseas where the cost of labor is low. D) Cloud computing has little effect on a data warehouse's scalability.

b

Prediction problems where the variables have numeric values are most accurately defined as A) classifications. B) regressions. C) associations. D) computations.

b

Search engine optimization (SEO) is a means by which A) Web site developers can negotiate better deals for paid ads. B) Web site developers can increase Web site search rankings. C) Web site developers index their Web sites for search engines. D) Web site developers optimize the artistic features of their Web sites.

b

What is Six Sigma? A) a letter in the Greek alphabet that statisticians use to measure process variability B) a methodology aimed at reducing the number of defects in a business process C) a methodology aimed at reducing the amount of variability in a business process D) a methodology aimed at measuring the amount of variability in a business process

b

Which approach to data warehouse integration focuses more on sharing process functionality than data across systems? A) extraction, transformation, and load B) enterprise application integration C) enterprise information integration D) enterprise function integration

b

Which of the following BEST enables a data warehouse to handle complex queries and scale up to handle many more requests? A) use of the Web by users as a front-end B) parallel processing C) Microsoft Windows D) a larger IT staff

b

Which of the following is a data mining myth? A) Data mining is a multistep process that requires deliberate, proactive design and use. B) Data mining requires a separate, dedicated database. C) The current state-of-the-art is ready to go for almost any business. D) Newer Web-based tools enable managers of all educational levels to do data mining.

b

Which of the following statements about Web site conversion statistics is FALSE? A) Web site visitors can be classed as either new or returning. B) Visitors who begin a purchase on most Web sites must complete it. C) The conversion rate is the number of people who take action divided by the number of visitors. D) Analyzing exit rates can tell you why visitors left your Web site.

b

27) Which broad area of data mining applications analyzes data, forming rules to distinguish between defined classes? A) associations B) visualization C) classification D) clustering

c

39) In the Target case study, why did Target send a teen maternity ads? A) Target's analytic model confused her with an older woman with a similar name. B) Target was sending ads to all women in a particular neighborhood. C) Target's analytic model suggested she was pregnant based on her buying habits. D) Target was using a special promotion that targeted all teens in her geographical area. Answer: C

c

A Web client that connects to a Web server, which is in turn connected to a BI application server, is reflective of a A) one-tier architecture. B) two-tier architecture. C) three-tier architecture. D) four-tier architecture.

c

All of the following are benefits of hosted data warehouses EXCEPT A) smaller upfront investment. B) better quality hardware. C) greater control of data. D) frees up in-house systems.

c

All of the following statements about data mining are true EXCEPT: A) The term is relatively new. B) Its techniques have their roots in traditional statistical analysis and artificial intelligence. C) The ideas behind it are relatively new. D) Intense, global competition make its application more important.

c

Real-time data warehousing can be used to support the highest level of decision making sophistication and power. The major feature that enables this in relation to handling the data is A) country of (data) origin. B) nature of the data. C) speed of data transfer. D) source of the data.

c

Third party providers of publicly available data sets protect the anonymity of the individuals in the data set primarily by A) asking data users to use the data ethically. B) leaving in identifiers (e.g., name), but changing other variables. C) removing identifiers such as names and social security numbers. D) letting individuals in the data know their data is being accessed.

c

Understanding customers better has helped Amazon and others become more successful. The understanding comes primarily from A) collecting data about customers and transactions. B) developing a philosophy that is data analytics-centric. C) analyzing the vast data amounts routinely collected. D) asking the customers what they want.

c

information

Data perceived by someone with knowledge to provide meaning

Oper marts are created when operational data needs to be analyzed A) linearly. B) in a dashboard. C) unidimensionally. D) multidimensionally.

d

Briefly describe techniques (or algorithms) that are used for classification modeling.

Decision tree analysis. Decision tree analysis (a machine-learning technique) is arguably the most popular classification technique in the data mining arena. • Statistical analysis. Statistical techniques were the primary classification algorithm for many years until the emergence of machine-learning techniques. Statistical classification techniques include logistic regression and discriminant analysis.

What are the three categories of social media analytics technologies and what do they do?

Descriptive analytics: Uses simple statistics to identify activity characteristics and trends, such as how many followers you have, how many reviews were generated on Facebook, and which channels are being used most often. • Social network analysis: Follows the links between friends, fans, and followers to identify connections of influence as well as the biggest sources of influence. • Advanced analytics: Includes predictive analytics and text analytics that examine the content in online conversations to identify themes, sentiments, and connections that would not be

________ statistics help you understand whether your specific marketing objective for a Web page is being achieved.

conversion

Web ________ are used to automatically read through the contents of Web sites.

crawlers/spiders

) Data warehouses provide direct and indirect benefits to organizations. Which of the following is an indirect benefit of data warehouses? A) better and more timely information B) extensive new analyses performed by users C) simplified access to data D) improved customer service

d

) Which broad area of data mining applications partitions a collection of objects into natural groupings with similar features? A) associations B) visualization C) classification D) clustering

d

34) What does the robustness of a data mining method refer to? A) its ability to predict the outcome of a previously unknown data set accurately B) its speed of computation and computational costs in using the mode C) its ability to construct a prediction model efficiently given a large amount of data D) its ability to overcome noisy data to make somewhat accurate predictions

d

40) All of the following are true about in-database processing technology EXCEPT A) it pushes the algorithms to where the data is. B) it makes the response to queries much faster than conventional databases. C) it is often used for apps like credit card fraud detection and investment risk management. D) it is the same as in-memory storage technology.

d

A data mining study is specific to addressing a well-defined business task, and different business tasks require A) general organizational data. B) general industry data. C) general economic data. D) different sets of data.

d

Breaking up a Web page into its components to identify worthy words/terms and indexing them using a set of rules is called A) preprocessing the documents. B) document analysis. C) creating the term-by-document matrix. D) parsing the documents

d

In which stage of extraction, transformation, and load (ETL) into a data warehouse are anomalies detected and corrected? A) transformation B) extraction C) load D) cleanse

d

The basic idea behind a(n) ________ is that it recursively divides a training set until each division consists entirely or primarily of examples from one class.

decision tree


Related study sets

Chapter 8: Establishing a Constructive Climate

View Set

Chapter 41 Management of pt. with musculoskeletal disorders Prep-U

View Set

Español 1 - Situations - Preguntas Esenciales para la Clase

View Set

Chapter 4: The Structure of the Atom

View Set

Chapter 22 - Integumentary System

View Set