ITM 4273

Ace your homework & exams now with Quizwiz!

Which of the following correctly defines a text mining term? A) Tagging is the number of times a word is found in a specific document. B) A token is an uncategorized block of text in a sentence. C) Rooting is the process of reducing inflected words to their base form. D) A term is a single word or multiword phrase extracted directly from the corpus by means of NLP methods.

A term is a single word or multiword phrase extracted directly from the corpus by means of NLP methods.

Why will computers probably not be able to understand natural language the same way and with the same accuracy that humans do? A) A true understanding of meaning requires extensive knowledge of a topic beyond what is in the words, sentences, and paragraphs. B) The natural human language is too specific. C) The part of speech depends only on the definition and not on the context within which it is used. D) All of the above.

A true understanding of meaning requires extensive knowledge of a topic beyond what is in the words, sentences, and paragraphs.

Why will computers probably not be able to understand natural language the same way and with the same accuracy that humans do?

Natural human language is vague for computers to understand; and a true understanding of meaning requires extensive knowledge of a topic beyond what is in the words, sentences, and paragraphs.

Text mining is the semi-automated process of extracting ________ from large amounts of unstructured data sources. A) patterns B) useful information C) knowledge D) all of the above

all of the above

Forward-thinking companies like Ask.com, Scholastic, and St. John Health System are actively using Web mining systems to answer important questions of "Who?" "Why?" and "How?" The benefits of integrating these systems: A) are measured qualitatively in terms of customer satisfaction, but not measured using financial or other quantitative measure. B) can be significant in terms of incremental financial growth and increasing customer loyalty and satisfaction. C) have not yet outweighed the costs of the Web mining systems and analysis. D) can be infinitely measurable.

can be significant in terms of incremental financial growth and increasing customer loyalty and satisfaction.

Analysis of the information collected by Web servers can help better understand user behavior. Analysis of this data is called ________ analysis.

clickstream

In ________, the problem is to group an unlabelled collection of objects, such as documents, customer comments, and Web pages into meaningful groups without any prior knowledge. A) search recall B) classification C) clustering D) grouping

clustering

_______ is the grouping of similar documents without having a predefined set of categories.

clustering

The ________ model, which is one where multiple sources of data describing the same population are integrated to increase the depth and richness of the resulting analysis, forms the framework of the Web site optimization ecosystem.

convergent validation

At a very high level, the first of three consecutive tasks in the text mining process is to establish the ________, which is a list of organized documents.

corpus

in linguistics, a(n) ________ is a large and structured set of texts prepared for the purpose of conducting knowledge discovery.

corpus

At a very high level, the text mining process consists of each of the following tasks except: A) create log frequencies B) establish the corpus C) create the term-document matrix D) extract the knowledge

create log frequencies

________ is the process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data stored in structured databases, where the data are organized in records structured by categorical, ordinal, or continuous variables.

data mining

All of the following are popular application areas of text mining except: A) information extraction B) document summarization C) question answering D) data structuring

data structuring

The term "stop-words" are used by text mining to ________ commonly used words.

eliminate

Commercial software tools include all of the following except: A) GATE B) IBM Intelligent Miner Data Mining Suite C) SAS Text Miner D) SPSS Text Mining

gate

A ________ is one or more Web pages that provide a collection of links to authoritative pages, reference sites, or a resource list on a specific topic. A) hub B) hyperlink-induced topic search C) spoke D) community

hub

A(n) ________ is one or more Web pages that provide a collection of links to authoritative pages.

hub

All of the following are types of data generated through Web page visits except: A) data stored in server access logs, referrer logs, agent logs, and client-side cookies B) user profiles C) hyperlink analysis D) metadata, such as page attributes, content attributes, and usage data

hyperlink analysis

One of the main approaches to text classification is ________ in which an expert's knowledge is encoded into the system either declaratively or in the form of procedural classification rules.

knowledge engineering

The two main approaches to text classification are ________ and ________. A) knowledge engineering; machine learning B) categorization; clustering C) association; trend analysis D) knowledge extraction; association

knowledge engineering; machine learning

Fundamental to the optimization process is ________, gathering data and information that can then be transformed into tangible analysis and recommendations for improvement using Web mining tools and techniques.

measurement

________ is a branch of the field of linguistics and a part of natural language processing that studies the internal structure of words. A) Morphology B) Corpus C) Stemming D) Polysemes

morphology

It has been shown that the bag-of-word method may not produce good enough information content for text mining tasks. More advanced techniques such as ________ are needed. A) classification B) natural language processing C) evidence-based processing D) symbolic processing

natural language processing

________ is an important component of text mining and is a subfield of artificial intelligence and computational linguistics. It studies the problem of understanding the natural human language.

natural language processing (NLP)

Web analytics, CEM, and VOC applications form the foundation of the Web site ________ ecosystem that supports the online business' ability to positively influence desired outcomes.

optimization

Using ________ as a rich source of knowledge and a strategic weapon, Kodak not only survives but excels in its market segment defined by innovation and constant change. A) visualization B) deception detection C) patent analysis D) semantic cues

patent analysis

When registered users revisit Amazon.com, they are greeted by name. This task involves recognizing the user by ________. A) pattern discovery B) association C) text mining D) reading a cookie

reading a cookie

________ analysis is a technique used to detect favorable and unfavorable opinions toward specific products and services using textual data sources, such as customer feedback in Web postings and the detection of unfavorable rumors.

sentiment

__________ is the process of reducing inflected words to their base or root form

stemming

______ words or noise words are words that are filtered out prior to or after processing of natural language data

stop

In the text mining process, the output of task two is a flat file called a ________ matrix where the cells are populated with the term frequencies.

term-document

________ is the semi-automated process of extracting patterns from large amounts of unstructured data sources.

text mining

Why does the Web pose great challenges for effective and efficient knowledge discovery? A) The Web search engines are indexed-based. B) The Web is too dynamic. C) The Web is too specific to a domain. D) The Web infrastructure contains hyperlink information.

the web is too dynamic

What is the primary purpose of text mining within the context of knowledge discovery?

to process unstructured (textual) data along with structured data, if relevant to the problem, to extract meaningful and actionable patterns for better decision making.

A vast majority of business data are stored in text documents that are ________. A) mostly quantitative B) virtually unstructured C) semi-structured D) highly structured

virtually unstructured

______ applications focus on "who and how" questions by gathering and reporting direct feedback from site visitors, by benchmarking against other sites and offline channels, and by supporting predictive modeling of future visitor behavior.

voice of customer

A simple keyword-based search engine suffers from several deficiencies, which include all of the following except: A) a topic of any breath can easily contain hundreds or thousands of documents B) many documents that are highly relevant to a topic may not contain the exact keywords defining them C) web mining can identify authoritative Web pages D) many of the search results are marginally or not relevant to the topic

web mining can identify authoritative web pages

Which of the following is not one of the three main areas of Web mining? A) Web search mining B) Web content mining C) Web structure mining D) Web usage mining

web search mining

______ mining is the process of extracting useful information from the links embedded in Web documents

web structure

Which of the following refers to developing useful information from the links included in the Web documents? A) Web content mining B) Web subject mining C) Web structure mining D) Web matter mining

web structure mining

_______ mining is the extraction of useful information from data generated through Web page visits and transactions.

web usage


Related study sets

AP Psychology- Social Psychology Review

View Set