DSCI 4330 Exam 2

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Market Basket Analysis is an example of: A. Cluster analysis B. Regression analysis C. Association analysis D. Classification analysis

C. Association analysis

What data mining technique can be used to improve product placement on the sales floor by revealing the patterns of combinations of retail products purchased together? A. Linear regression analysis of products B. Cluster analysis of products C. Association rule mining D. Classification of products

C. Association rule mining

What is the most popular data mining process in the industry? A. KDD Process B. SEMMA C. CRISP-DM

C. CRISP-DM

"The nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data stored in structured databases." is a definition for ... A. Data warehousing B. Data wrangling C. Data mining D. ETL

C. Data mining

In social media analytics terminology, "the extent to which actors form ties with similar versus dissimilar others" is called... A. Reciprocity B. Network closure C. Homophily D. Multiplexity

C. Homophily

In sentiment analysis, which of the following is an implicit opinion? A. The hotel we stayed in was terrible. B. The cruise we went on last summer was a disaster. C. The customer service I got for my TV was laughable. D. Our new mayor is great for the city.

C. The customer service I got for my TV was laughable

In text analysis, what is a lexicon? A. a catalog of customers, their words, and phrases B. a catalog of letters, words, phrases, and sentences C. a catalog of words, their synonyms, and their meanings D. a catalog of customers, products, words, and phrases

C. a catalog of words, their synonyms, and their meanings

In text mining, tokenizing is the process of A. creating new branches or stems of recorded paragraphs. B. transforming the term-by-document matrix to a manageable size. C. categorizing and separating a block of text in a sentence. D. reducing multiple words to their base or root

C. categorizing and separating a block of text in a sentence

Clustering partitions a collection of things into segments whose members share A. similar collection methods. B. dissimilar characteristics. C. similar characteristics. D. dissimilar collection methods.

C. similar characteristics

All of the following statements about data mining are true except A. the valid aspect means that the discovered patterns should hold true on new data B. the potentially useful aspect means that the results should lead to some business benefit C. the process aspect means that data mining should be a one-step process to results D. the novel aspect means that previously unknown patterns are discovered

C. the process aspect means that data mining should be a one-step process to results

Which task is NOT a supervised learning process? A. Predicting customer attrition B. Segmenting customer base C. Predicting sales D. Detecting credit card fraud

B. Segmenting customer base

A type of unsupervised machine learning that involves finding a hidden pattern in unlabeled data: A. Time series analysis B. Classification C. Linear regression D. Clustering

D. Clustering

In data mining, finding an affinity of two products to be commonly together in a shopping cart is known as A. decision trees. B. cluster analysis. C. logistic regression. D. association rule mining.

D. association rule mining

In unsupervised learning models, there is always a target variable in data and the task is to predict it. True False

False

Using data mining on data about imports and exports can help to detect tax avoidance and money laundering. True False

True

Using only a part of the entire data to train a prediction model, and using the other split to test model performance is a common strategy to prevent over-fitting. True False

True

When a problem has many attributes that impact the classification of different patterns, decision trees may be a useful approach. True False

True

Which data mining process/methodology is thought to be the most comprehensive, according to kdnuggets.com rankings? A. CRISP-DM B. proprietary organizational methodologies C. KDD Process D. SEMMA

A. CRISP-DM

What software/tool is responsible for processing the documents (Web pages or document files) and placing them into the document database? A. Document indexer B. Meta data engine C. web crawler D> Search engine

A. Document indexer

Which one is NOT a classification technique? A. Link analysis B. Support vector machines C. Neural networks D. Decision trees

A. Link analysis

Which of the following is NOT an application of sentiment analysis? A. Text generation B. Brand management C. Voice of the customer D. Politics

A. Text generation

Which of the following is an application of text mining? A. all of others B. Literature-based gene identification C. customer relation management D. deception detection in cyber security

A. all of others

Natural language processing (NLP) is associated with which of the following areas? A. all of these B. computational linguistics C. artificial intelligence D. text mining

A. all of these

Which broad area of data mining applications analyzes data, forming rules to distinguish between defined classes? A. classification B. visualization C. associations D. clustering

A. classification

Which broad area of data mining applications partitions a collection of objects into natural groupings with similar features? A. clustering B. regression C. visualization D. associations

A. clustering

All of the following are challenges associated with natural language processing EXCEPT A. dividing up a text into individual words in English. B. understanding the context in which something is said. C. distinguishing between words that have more than one meaning. D. recognizing typographical or grammatical errors in texts.

A. dividing up a text into individual words in English

What is the data source for Web structure mining? A. links embedded in Web documents B. Social media C. Textual content of the Web D. Web usage data

A. links embedded in Web documents

In the text mining process, which step/task introduces structure to the corpus? A. Removing Stop Words B. Creating the Term- Document Matrix C. Establishing the Corpus D. Extracting Knowledge

B. Creating the Term- Document Matrix

The Web poses great challenges for effective and efficient knowledge discovery. Which of the following is NOT one of those challenges? A. The Web is too complex B. The Web is specific to a domain C. The Web is too big for effective data mining D. The Web is too dynamic

B. The Web is specific to a domain

The following figure shows a typical process of splitting the data set in data mining. What is the name of the set that is used for model development? A. Development data B. Training data C. Cross-validation data D. Validation data

B. Training data

Identifying and preventing incorrect claim payments and fraudulent activities falls under which type of data mining applications? A. computer hardware and software B. insurance C. retailing and logistics D. customer relationship management

B. insurance

What does the scalability of a data mining method refer to? A. its ability to predict the outcome of a previously unknown data set accurately B. its ability to construct a prediction model efficiently given a large amount of data C. its speed of computation and computational costs in using the mode D. its ability to overcome noisy data to make somewhat accurate predictions

B. its ability to construct a prediction model efficiently given a large amount of data

Prediction problems where the variables have numeric values are most accurately defined as A. segmentation B. regression C. classification D. clustering

B. regression

In estimating the accuracy of data mining (or other) classification models, the true positive rate is A. the ratio of correctly classified positives divided by the sum of correctly classified positives and incorrectly classified positives. B. the ratio of correctly classified negatives divided by the total negative count. C. the ratio of correctly classified positives divided by the total positive count. D. the ratio of correctly classified positives divided by the sum of correctly classified positives and incorrectly classified negatives.

C. the ratio of correctly classified positives divided by the total positive count

Which discipline is related to and used in data mining? A. Statistics B. Management Science C. Information Visualization D. All of others

D. All of others

In text mining, the main purpose of this activity is to collect all the documents related to the context (domain of interest) being studied. What is this activity? A. Extracting Knowledge B. Removing Stop Words C. Creating the Term- Document Matrix D. Establishing the Corpus

D. Establishing the Corpus

What does advanced analytics for social media do? A. It helps identify your followers. B. It identifies links between groups. C. It identifies the biggest sources of influence online. D. It examines the content of online conversations.

D. It examines the content of online conversations

What do voice of the market (VOM) applications of sentiment analysis do? A. They examine the stock market for trends. B. They examine the "market of ideas" in politics. C. They examine employee sentiment in the organization. D. They examine customer sentiment at the aggregate level.

D. They examine customer sentiment at the aggregate level

What is one major way in which Web-based social media differs from traditional publishing media? A. Most Web-based media are operated by the government and large firms. B. Web-based media have a narrower range of quality. C. They use different languages of publication. D. They have different costs to own and operate.

D. They have different costs to own and operate

What is the goal of clustering? A. To create groups for decision tree analysis B. To create groups where members have maximum similarity to each other and have absolutely no similarity to members of other groups C. To create groups for a general regression analysis D. To create groups where members have maximum similarity to each other and have minimum similarity to members of other groups

D. To create groups where members have maximum similarity to each other and have minimum similarity to members of other groups

What are the main steps in the text mining process? A. Creating the Term- Document Matrix B. Extracting Knowledge C. Establishing the Corpus D. all of others

D. all of others

What does Web content mining involve? A. analyzing the PageRank and other metadata of a Web page B. analyzing the pattern of visits to a Web site C. analyzing the universal resource locator in Web pages D. analyzing the unstructured content of Web pages

D. analyzing the unstructured content of Web pages

Understanding which keywords your users enter to reach your Web site through a search engine can help you understand A. the type of Web browser being used by your Web site visitors. B. most of your Web site visitors' wants and needs. C. the hardware your Web site is running on. D. how well visitors understand your products.

D. how well visitors understand your products

Breaking up a Web page into its components to identify worthy words/terms and indexing them using a set of rules is called A. creating the term-by-document matrix. B. preprocessing the documents. C. document analysis. D. parsing the documents.

D. parsing the documents

Companies understand that when their product goes "viral," the content of the online conversations about their product does not matter, only the volume of conversations. True False

False

Data is the most critical ingredient for DM which never includes soft or unstructured data. True False

False

Data mining can be very useful in detecting patterns such as credit card fraud, but is of little help in improving sales. True False

False

Data mining is only for large firms that have lots of customer data. True False

False

Data mining provides instant, crystal-ball-like predictions True False

False

Descriptive statistics for social media feature such items as your followers as well as the content in online conversations that help you to identify themes and sentiments. True False

False

In sentiment analysis, sentiment suggests a transient, temporary opinion reflective of one's feelings. True False

False

In the cancer research case study, data mining algorithms that predict cancer survivability with high predictive power are good replacements for medical professionals. True False

False

Search engines are only used in the context of the World Wide Web (WWW). True False

False

Since little can be done about visitor Web site abandonment rates, organizations have to focus their efforts on increasing the number of new visitors. True False

False

Text analytics is the subset of text mining that handles information retrieval and extraction, plus data mining. True False

False

Unlike other IS initiatives, a data mining project does not need to follow a systematic project management process to be successful. True False

False

Web mining is the process of discovering intrinsic relationships from Web data, which are expressed exclusively in the form of textual information. True False

False

A Web crawler is a piece of software that systematically browses (crawls through) the World Wide Web for the purpose of finding and fetching Web pages. True False

True

A source of data for data mining is often a consolidated data warehouse, but not always. True False

True

Articles and auxiliary verbs are assigned little value in text mining and are usually filtered out. True False

True

Categorization and clustering of documents during text mining differ only in the preselection of categories. True False

True

Clickstream analysis does not need users to enter their perceptions of the Web site or other feedback directly to be useful in determining their preferences. True False

True

Converting continuous valued numerical variables to ranges and categories is referred to as discretization. True False

True

Current use of sentiment analysis in voice of the customer applications allows companies to change their products or services in real time in response to customer sentiment. True False

True

During classification in data mining, a false positive is an occurrence classified as true by the algorithm while being false in reality. True False

True

Ensemble models for predictive analytics produce more robust and reliable predictions. True False

True

If the data accurately reflect the business or its customers, any company can use data mining. True False

True

In Web analytics "conversion" can be defined as a sale by an online purchase, a completed registration, an online submission, or any number of other Web activities. True False

True

In clustering the problem is to group an unlabeled collection of objects (e.g., documents, customer comments, Web pages) into meaningful clusters without any prior knowledge. True False

True

In data mining, Total Model Error is equal to bias error + variance error. True False

True

In data mining, classification models help in prediction. True False

True

In sentiment analysis, it is hard to classify some subjects such as news as good or bad, but easier to classify others, e.g., movie reviews, in the same way. True False

True

In sentiment analysis, only subjective expressions (sentences) are classified as negative or positive, and objective expressions should be excluded from the analysis. True False

True

One of the common mistakes in data mining is to look only at aggregated results and not at individual records and predictions. True False

True

Regional accents present challenges for natural language processing. True False

True

SEO is the intentional activity of affecting the visibility of an e-commerce site or a Web site in a search engine's natural (unpaid or organic) search results. True False

True

The cost of data storage has plummeted recently, making data mining feasible for more firms. Correct Answer True False

True

The main idea in generating association rules (or solving market-basket problems) is to identify the frequent sets that go together. True False

True

The query analyzer is responsible for receiving a search request from the user (via the search engine's Web server interface) and converting it into a standardized data structure, so that it can be easily queried/matched against the entries in the document database. True False

True


Set pelajaran terkait

NY Driving Test - English/Portuguese

View Set

NurseLogic: Nursing Concepts Advanced

View Set

PT 2 Boiler Combustion and Firing Methods Test 1

View Set

The Giraffe and the Pelly and Me, Pages 1-30

View Set

Háztartások költségvetése és erőforrásai

View Set

American Government Exam (Dr. Richard/Mercyhurst)

View Set

Life Insurance- Chapter 1: General Insurance

View Set