Chapter 5 Business Intelligence

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Identify, with a brief description, each of the four steps in the sentiment analysis process.

1. Sentiment Detection: Here the goal is to differentiate between a fact and an opinion, which may be viewed as classification of text as objective or subjective.2. N-P Polarity Classification: Given an opinionated piece of text, the goal is to classify the opinion as falling under one of two opposing sentiment polarities, or locate its position on the continuum between these two polarities.3. Target Identification: The goal of this step is to accurately identify the target of the expressed sentiment.4. Collection and Aggregation: In this step all text data points in the document are aggregated and converted to a single sentiment measure for the whole document.

What is the difference between white hat and black hat SEO activities?

An SEO technique is considered white hat if it conforms to the search engines' guidelines and involves no deception. Because search engine guidelines are not written as a series of rules or commandments, this is an important distinction to note. White-hat SEO is not just about following guidelines, but about ensuring that the content a search engine indexes and subsequently ranks is the same content a user will see.Black-hat SEO attempts to improve rankings in ways that are disapproved by the search engines, or involve deception or trying to trick search engine algorithms from their intended purpose.

is a segmentation metric for social networks that measures the strength of the bonds between actors in a social network.

Cohesion

statistics help you understand whether your specific marketing objective for a Web page is being achieved.

Conversion

At a very high level, the text mining process can be broken down into three consecutive tasks, the first of which is to establish the ________.

Corpus

IBM's Watson utilizes a massively parallel, text mining-focused, probabilistic evidence-based computational architecture called ________.

DeepQA

What are the three categories of social media analytics technologies and what do they do?

Descriptive analytics: Uses simple statistics to identify activity characteristics and trends, such as how many followers you have, how many reviews were generated on Facebook, and which channels are being used most often. •Social network analysis: Follows the links between friends, fans, and followers to identify connections of influence as well as the biggest sources of influence. •Advanced analytics: Includes predictive analytics and text analytics that examine the content in online conversations to identify themes, sentiments, and connections that would not be revealed by casual surveillance.

Companies understand that when their product goes "viral," the content of the online conversations about their product does not matter, only the volume of conversations.

False

Consistent high quality, higher publishing frequency, and longer time lag are all attributes of industrial publishing when compared to Web publishing.

False

Descriptive analytics for social media feature such items as your followers as well as the content in online conversations that help you to identify themes and sentiments.

False

In sentiment analysis, sentiment suggests a transient, temporary opinion reflective of one's feelings.

False

In the car insurance case study, text mining was used to identify auto features that caused injuries.

False

In the evolution of social media user engagement, the largest recent change is the growth of creators.

False

Search engine optimization (SEO) techniques play a minor role in a Web site's search ranking because only well-written content matters.

False

Search engines are only used in the context of the World Wide Web (WWW).

False

Since little can be done about visitor Web site abandonment rates, organizations have to focus their efforts on increasing the number of new visitors.

False

Text analytics is the subset of text mining that handles information retrieval and extraction, plus data mining.

False

Web-based media has nearly identical cost and scale structures as traditional media.

False

In the security domain, one of the largest and most prominent text mining applications is the highly classified ECHELON surveillance system. What is ECHELON assumed to be capable of doing?

Identifying the content of telephone calls, faxes, e-mails, and other types of data and intercepting information sent via satellites, public switched telephone networks, and microwave links

Why are the users' page views and time spent on your Web site important metrics?

If people come to your Web site and don't view many pages, that is undesirable and your Web site may have issues with its design or structure. Another explanation for low page views is a disconnect in the marketing messages that brought them to the site and the content that is actually available.Generally, the longer a person spends on your Web site, the better it is. That could mean they're carefully reviewing your content, utilizing interactive components you have available, and building toward an informed decision to buy, respond, or take the next step you've provided. On the contrary, the time on site also needs to be examined against the number of pages viewed to make sure the visitor isn't spending his or her time trying to locate content that should be more readily accessible.

How would you describe information extraction in text mining?

Information extraction is the identification of key phrases and relationships within text by looking for predefined objects and sequences in text by way of pattern matching.

What does advanced analytics for social media do?

It examines the content of online conversations.

Natural language processing (NLP), a subfield of artificial intelligence and computational linguistics, is an important component of text mining. What is the definition of NLP?

NLP is a discipline that studies the problem of understanding the natural human language, with the view of converting depictions of human language into more formal representations in the form of numeric and symbolic data that are easier for computer programs to manipulate.

Web analytics refers to measurement and analysis of data relating to your company that takes place outside your Web site.

Off-site

also called homonyms, are syntactically identical words with different meanings.

Polysemes

is a connections metric for social networks that measures the ties that actors in a network have with others that are geographically close.

Propinquity

What is search engine optimization (SEO) and why is it important for organizations that own Web sites?

Search engine optimization (SEO) is the intentional activity of affecting the visibility of an e-commerce site or a Web site in a search engine's natural (unpaid or organic) search results. In general, the higher ranked on the search results page, and more frequently a site appears in the search results list, the more visitors it will receive from the search engine's users.Being indexed by search engines like Google, Bing, and Yahoo! is not good enough for businesses. Getting ranked on the most widely used search engines and getting ranked higher than your competitors are what make the difference.

is a technique used to detect favorable and unfavorable opinions toward specific products and services using large numbers of textual data sources.

Sentiment analysis

In what ways does the Web pose great challenges for effective and efficient knowledge discovery through data mining?

The Web is too big for effective data mining. The Web is so large and growing so rapidly that it is difficult to even quantify its size. Because of the sheer size of the Web, it is not feasible to set up a data warehouse to replicate, store, and integrate all of the data on the Web, making data collection and integration a challenge. •The Web is too complex. The complexity of a Web page is far greater than a page in a traditional text document collection. Web pages lack a unified structure. They contain far more authoring style and content variation than any set of books, articles, or other traditional text-based document. •The Web is too dynamic. The Web is a highly dynamic information source. Not only does the Web grow rapidly, but its content is constantly being updated. Blogs, news stories, stock market results, weather reports, sports scores, prices, company advertisements, and numerous other types of information are updated regularly on the Web. •The Web is not specific to a domain. The Web serves a broad diversity of communities and connects billions of workstations. Web users have very different backgrounds, interests, and usage purposes. Most users may not have good knowledge of the structure of the information network and may not be aware of the heavy cost of a particular search that they perform. •The Web has everything. Only a small portion of the information on the Web is truly relevant or useful to someone (or some task). Finding the portion of the Web that is truly relevant to a person and the task being performed is a prominent issue in Web-related research.

In sentiment analysis, which of the following is an implicit opinion?

The customer service I got for my TV was laughable.

What do voice of the market (VOM) applications of sentiment analysis do?

They examine customer sentiment at the aggregate level.

What is one major way in which Web-based social media differs from traditional publishing media?

They have different costs to own and operate.

Describe the query-specific clustering method as it relates to clustering.

This method employs a hierarchical clustering approach where the most relevant documents to the posed query appear in small tight clusters that are nested in larger clusters containing less similar documents, creating a spectrum of relevance levels among the documents.

Articles and auxiliary verbs are assigned little value in text mining and are usually filtered out.

True

Categorization and clustering of documents during text mining differ only in the preselection of categories.

True

Clickstream analysis does not need users to enter their perceptions of the Web site or other feedback directly to be useful in determining their preferences.

True

Current use of sentiment analysis in voice of the customer applications allows companies to change their products or services in real time in response to customer sentiment.

True

In sentiment analysis, it is hard to classify some subjects such as news as good or bad, but easier to classify others, e.g., movie reviews, in the same way.

True

In text mining, if an association between two concepts has 7% support, it means that 7% of the documents had both concepts represented in the same document.

True

In the Tito's Vodka case study, trends in cocktails were studied to create a quarterly recipe for customers.

True

In the Wimbledon case study, designers balanced the needs of mobile and desktop computer users.

True

Regional accents present challenges for natural language processing.

True

Which of the following statements about Web site conversion statistics is FALSE?

Visitors who begin a purchase on most Web sites must complete it.

________ is mostly driven by sentiment analysis and is a key element of customer experience management initiatives, where the goal is to create an intimate relationship with the customer.

Voice of the customer (VOC)

Search engine optimization (SEO) is a means by which

Web site developers can increase Web site search rankings.

Web site usability may be rated poor if

Web site visitors download few of your offered PDFs and videos.

In text analysis, what is a lexicon?

a catalog of words, their synonyms, and their meanings

In the opening vignette, the architectural system that supported Watson used all the following elements EXCEPT

a core engine that could operate seamlessly in another domain without changes.

Natural language processing (NLP) is associated with which of the following areas?

all of these

What does Web content mining involve?

analyzing the unstructured content of Web pages

In text mining, tokenizing is the process of

categorizing a block of text in a sentence

In the Lotte.com retail case, the company deployed SAS for Customer Experience Analytics to better understand the quality of customer traffic on their Web site, classify order rates, and see which ________ had the most visitors.

channels

In the Tito's Vodka case, it was important that social media users all had a(n) ________ brand experience.

consistent

Web ________ are used to automatically read through the contents of Web sites.

crawlers/spiders

Because the term document matrix is often very large and rather sparse, an important optimization step is to reduce the ________ of the matrix.

dimensionality

All of the following are challenges associated with natural language processing EXCEPT

dividing up a text into individual words in English.

Understanding which keywords your users enter to reach your Web site through a search engine can help you understand

how well visitors understand your products.

A(n) ________ is one or more Web pages that provide a collection of links to authoritative Web pages.

hub

Web pages contain both unstructured information and ________, which are connections to other Web pages. You Answered

hyperlinks

In the Mining for Lies case study, a text based deception-detection method used by Fuller and others in 2008 was based on a process known as ________, which relies on elements of data and text mining techniques.

message feature mining

What are the two main types of Web analytics?

off-site and on-site Web analytics

Breaking up a Web page into its components to identify worthy words/terms and indexing them using a set of rules is called

parsing the documents.

When viewed as a binary feature, ________ classification is the binary classification task of labeling an opinionated document as expressing either an overall positive or an overall negative opinion.

polarity

A(n) ________ Web site contains links that send traffic directly to your Web site.

referral

A(n) ________ engine is a software program that searches for Web sites or files based on keywords.

search

In the Wimbledon case study, the tournament used data for each match in real time to highlight

significant events.

What types of documents are BEST suited to semantic labeling and aggregation to determine sentiment orientation?

small- to medium-sized documents

In the research literature case study, the researchers analyzing academic papers extracted information from which source?

the paper abstract

Sentiment analysis projects require a lexicon for use. If a project in English is undertaken, you must generally make sure to

use an English lexicon appropriate to the project at your discretion.

When a word has more than one meaning, selecting the meaning that makes the most sense can only be accomplished by taking into account the context within which the word is used. This concept is known as

word sense disambiguation


Set pelajaran terkait

Chapter 11 Review Questions (no essay / short answer yet)

View Set

EXAMEN FINAL DE SEGUNDO PERIODO DE BACHILLERATO GENERAL SEGUNDO ANO

View Set

Fragile environments & climate change igcse geography

View Set

Texas Jurisprudence (combined sources)

View Set

NCLEX Questions; Pediatrics: Respiratory, GI, GU, Cardiac

View Set

ATI questions over cognition unit

View Set