Exam 2 - BI Save
Natural language processing (NLP) is associated with which of the following areas? - Text Mining - Artificial Intelligence - Computational linguistics - All of these
All of these
________ statistics help you understand whether your specific marketing objective for a Web page is being achieved.
Conversion
IBM's Watson utilizes a massively parallel, text mining-focused, probabilistic evidence-based computational architecture called ________.
DeepQA
Which of the following is the order of simulation methodology?
Define the problem, Construct the simulation model, Test and validate the model, Design the experiment, Conduct the experiment, Evaluate the results, Implement the results.
A decision table shows the relationships of the problem graphically and can handle complex situations in a compact form. T/F
False
A model builder makes predictions and assumptions regarding input data, many of which deal with the assessment of certain futures. T/F
False
All quantitative models are typically made up of six basic components. T/F
False
Business analysis is the monitoring, scanning, and interpretation of collected environmental information. T/F
False
Companies understand that when their product goes "viral," the content of the online conversations about their product does not matter, only the volume of conversations. T/F
False
Consistent high quality, higher publishing frequency, and longer time lag are all attributes of industrial publishing when compared to Web publishing. T/F
False
Descriptive analytics for social media feature such items as your followers as well as the content in online conversations that help you to identify themes and sentiments. T/F
False
In decision making under uncertainty, it is assumed that complete knowledge is available. T/F
False
In the car insurance case study, text mining was used to identify auto features that caused injuries. T/F
False
Result variables are considered independent variables. T/F
False
Web-based media has nearly identical cost and scale structures as traditional media. T/F
False
________ is performed by indicating a target cell, its desired value, and a changing cell.
Goal seeking
In the security domain, one of the largest and most prominent text mining applications is the highly classified ECHELON surveillance system. What is ECHELON assumed to be capable of doing?
Identifying the content of telephone calls, faxes, e-mails, and other types of data and intercepting information sent via satellites, public switched telephone networks, and microwave links
What does advanced analytics for social media do?
It examines the content of online conversations.
________, like data, must be managed to maintain their integrity, and thus their applicability.
Models
The most common simulation method for business decision problems is the ________ simulation.
Monte Carlo
Of the available solutions, at least one is the best, in the sense that the degree of goal attainment associated with it is the highest; this is called a(n) ________ solution.
Optimal
Place holder 1
Place holder 1
Place holder 2
Place holder 2
_______ analysis attempts to assess the impact of a change in the input data or parameters on the proposed solution.
Sensitivity
What type of VIM models display a visual image of the result of one decision alternative at a time?
Static
In what ways does the Web pose great challenges for effective and efficient knowledge discovery through data mining?
The Web is too big for effective data mining The Web is too complex The Web is too dynamic The Web is not specific to a domain The Web has everything
Which of the following is NOT a characteristic displayed by a LP allocation problem? - A limited quantity of economic resources is available for allocation. - The resources are used in the production of products or services. - There are two or more ways in which the resources can be used. The problem is not bound by constraints.
The problem is not bound by constraints
List and briefly discuss the major components of a quantitative model.
These components include: 1. Result (outcome) variables reflect the level of effectiveness of a system; that is, they indicate how well the system performs or attains its goal(s). 2. Decision variables describe alternative courses of action. The decision maker controls the decision variables. 3. Uncontrollable Variables - in any decision-making situation, there are factors that affect the result variables but are not under the control of the decision maker 4. Intermediate result variables reflect intermediate outcomes in mathematical models.
How are linear programming models vulnerable when used in complex situation?
These models have the ability to be vulnerable when used in very complex situations for a number of reasons. One reason focuses on the possibility that not all parameters can be known or understood. Another concern is that the standard characteristics of a linear programming calculation may not hold in more dynamic, real-world environments. Additionally, in more complex environments all actors may not be wholly rational and economic issues.
What do voice of the market (VOM) applications of sentiment analysis do?
They examine customer sentiment at the aggregate level.
What is one major way in which Web-based social media differs from traditional publishing media?
They have different costs to own and operate.
Why is there a trend to developing and using cloud-based tools for modeling?
This trend exists because it simplifies the process for users. These systems give them access to powerful tools and pre-existing models that they can use to solve business problems. Because these systems are cloud-based, there are costs associated with operating them and maintaining them.
Which of the following is NOT an assumption used by a LP allocation problem? - The resources are to be used in the most economical manner. - The return from any allocation is independent of other allocations. - Total returns cannot be compared. - All data are known with certainty.
Total returns cannot be compared
A decision made under risk is also known as a probabilistic or stochastic decision-making situation. T/F
True
Decision situations that involve a finite and usually not too large number of alternatives are modeled through an approach called decision analysis. T/F
True
Every LP model has some internal intermediate variables that are not explicitly stated. T/F
True
In the School District of Philadelphia case, Excel and an add-in was used to evaluate different vendor options. T/F
True
Many quantitative models of decision theory are based on comparing a single measure of effectiveness, generally some form of utility to the decision maker. T/F
True
Modeling is a key element for prescriptive analytics. T/F
True
Online commerce and communication has created an immense need for forecasting and an abundance of available information for performing it. T/F
True
Regional accents present challenges for natural language processing. T/F
True
Simulation is normally used only when a problem is too complex to be treated using numerical optimization techniques. T/F
True
Simulation is the appearance of reality. T/F
True
The pessimistic approach assumes that the worst possible outcome for each alternative will occur and selects the best of these. T/F
True
The ________ approach can be used in conjunction with artificial intelligence.
VIM
In text analysis, what is a lexicon?
a catalog of words, their synonyms, and their meanings
In the opening vignette, the architectural system that supported Watson used all the following elements EXCEPT - massive parallelism to enable simultaneous consideration of multiple hypotheses. - an underlying confidence subsystem that ranks and integrates answers. - a core engine that could operate seamlessly in another domain without changes. - integration of shallow and deep knowledge.
a core engine that could operate seamlessly in another domain without changes.
Spreadsheets use ________ to extend their functionality.
add-ins
The components of a quantitative model are linked by ________ expressions.
algebraic
Risk ________ is a decision-making method that analyzes the risk (based on assumed known probabilities) associated with different alternatives.
analysis
What does Web content mining involve?
analyzing the unstructured content of Web pages
In text mining, tokenizing is the process of
categorizing a block of text in a sentence.
When the decision maker knows exactly what the outcome of each course of action will be, this is decision making under
certainty
Which of the following is NOT a component of a quantitative model?
classes
A more general form of an influence diagram is called a(n)
cognitive map
If a simulation result does NOT match the intuition or judgment of the decision maker, what can occur?
confidence gap
Multiple goals is a decision situation in which alternatives are evaluated with several, sometimes ________, goals.
conflicting
In the Tito's Vodka case, it was important that social media users all had a(n) ________ brand experience.
consistent
At a very high level, the text mining process can be broken down into three consecutive tasks, the first of which is to establish the ________
corpus
Web ________ are used to automatically read through the contents of Web sites.
crawlers/spiders
Every LP model is composed of ________ variables whose values are unknown and are searched for.
decision
Because the term document matrix is often very large and rather sparse, an important optimization step is to reduce the ________ of the matrix.
dimensionality
All of the following are challenges associated with natural language processing EXCEPT - dividing up a text into individual words in English. - understanding the context in which something is said. - distinguishing between words that have more than one meaning. - recognizing typographical or grammatical errors in texts.
dividing up a text into individual words in English
A(n) ________ model can be constructed under assumed environments of certainty.
dynamic
A(n) ________ spreadsheet model represents behavior over time.
dynamic
This method calculates the values of the inputs necessary to generate a zero profit outcome.
goal seek
The most common method for solving a risk analysis problem is to select the alternative with the
greatest expected value
Understanding which keywords your users enter to reach your Web site through a search engine can help you understand
how well visitors understand your products.
A(n) ________ is one or more Web pages that provide a collection of links to authoritative Web pages.
hub
Web pages contain both unstructured information and ________, which are connections to other Web pages.
hyperlinks
A(n) ________ is a graphical representation of a model.
influence diagram
A decision tree can be cumbersome if there are
many alternatives
Intermediate result variables reflect intermediate outcomes in
mathematical models
What are the two main types of Web analytics?
off-site and on-site Web analytics
Factors that are not under the control of the decision maker but can be fixed, are called ________.
parameters
Important spreadsheet features for modeling include all of the following EXCEPT - what-if analysis. - goal seeking. - macros. - pivot tables.
pivot tables
When viewed as a binary feature, ________ classification is the binary classification task of labeling an opinionated document as expressing either an overall positive or an overall negative opinion.
polarity
In ________ simulation, one or more of the independent variables (e.g., the demand in an inventory problem) are subject to chance variation.
probabilistic
A(n) ________ Web site contains links that send traffic directly to your Web site.
referral
When the decision maker must consider several possible outcomes for each alternative, each with a given probability of occurrence, this is decision making under
risk
A(n) ________ engine is a software program that searches for Web sites or files based on keywords.
search
In the research literature case study, the researchers analyzing academic papers extracted information from which source?
the paper abstract
Sentiment analysis projects require a lexicon for use. If a project in English is undertaken, you must generally make sure to
use an English lexicon appropriate to the project at your discretion
Identification of a model's variables (e.g., decision, result, uncontrollable) is critical, as are the relationships among the ________.
variables
Selecting the best ________ to work with is a laborious yet important task for companies and government organizations.
vendors
When a word has more than one meaning, selecting the meaning that makes the most sense can only be accomplished by taking into account the context within which the word is used. This concept is known as ________.
word sense disambiguation
Which of the following statements about Web site conversion statistics is FALSE? -Web site visitors can be classed as either new or returning. -Visitors who begin a purchase on most Web sites must complete it. -The conversion rate is the number of people who take action divided by the number of visitors. -Analyzing exit rates can tell you why visitors left your Web site.
-Visitors who begin a purchase on most Web sites must complete it.
Identify, with a brief description, each of the four steps in the sentiment analysis process.
1. Sentiment Detection: Here the goal is to differentiate between a fact and an opinion, which may be viewed as classification of text as objective or subjective. 2. N-P Polarity Classification: Given an opinionated piece of text, the goal is to classify the opinion as falling under one of two opposing sentiment polarities, or locate its position on the continuum between these two polarities. 3. Target Identification: The goal of this step is to accurately identify the target of the expressed sentiment. 4. Collection and Aggregation: In this step all text data points in the document are aggregated and converted to a single sentiment measure for the whole document.
Which of the following is NOT an assumption used by a LP allocation problem? - Returns from different allocations can be compared. - The return from any allocation is independent of other allocations. - The total return is the sum of the returns yielded by the different activities. - All data are unknown with decision making under uncertainty.
All data are unknown with decision making under uncertainty.
What is the difference between white hat and black hat SEO activities?
An SEO technique is considered white hat if it conforms to the search engines' guidelines and involves no deception. Because search engine guidelines are not written as a series of rules or commandments, this is an important distinction to note. White-hat SEO is not just about following guidelines, but about ensuring that the content a search engine indexes and subsequently ranks is the same content a user will see. Black-hat SEO attempts to improve rankings in ways that are disapproved by the search engines, or involve deception or trying to trick search engine algorithms from their intended purpose.
In sentiment analysis, sentiment suggests a transient, temporary opinion reflective of one's feelings. T/F
False
In the evolution of social media user engagement, the largest recent change is the growth of creators. T/F
False
Search engine optimization (SEO) techniques play a minor role in a Web site's search ranking because only well-written content matters. T/F
False
Search engines are only used in the context of the World Wide Web (WWW). T/F
False
Simulations are an experimental, expensive, error-prone method for gaining insight into complex decision-making situations. T/F
False
Since little can be done about visitor Web site abandonment rates, organizations have to focus their efforts on increasing the number of new visitors. T/F
False
Spreadsheets are clearly the most popular developer modeling tool.
False
Spreadsheets include all possible tools needed to deploy a custom DSS. T/F
False
Text analytics is the subset of text mining that handles information retrieval and extraction, plus data mining. T/F
False
Why are the users' page views and time spent on your Web site important metrics?
If people come to your Web site and don't view many pages, that is undesirable and your Web site may have issues with its design or structure. Another explanation for low page views is a disconnect in the marketing messages that brought them to the site and the content that is actually available. Generally, the longer a person spends on your Web site, the better it is. That could mean they're carefully reviewing your content, utilizing interactive components you have available, and building toward an informed decision to buy, respond, or take the next step you've provided. On the contrary, the time on site also needs to be examined against the number of pages viewed to make sure the visitor isn't spending his or her time trying to locate content that should be more readily accessible.
Natural language processing (NLP), a subfield of artificial intelligence and computational linguistics, is an important component of text mining. What is the definition of NLP?
NLP is a discipline that studies the problem of understanding the natural human language, with the view of converting depictions of human language into more formal representations in the form of numeric and symbolic data that are easier for computer programs to manipulate.
________ Web analytics refers to measurement and analysis of data relating to your company that takes place outside your Web site.
Off-site
________, also called homonyms, are syntactically identical words with different meanings.
Polysemes
________ is a connections metric for social networks that measures the ties that actors in a network have with others that are geographically close.
Propinquity
A probabilistic decision-making situation is a decision made under ________.
Risk
What is search engine optimization (SEO) and why is it important for organizations that own Web sites?
Search engine optimization (SEO) is the intentional activity of affecting the visibility of an e-commerce site or a Web site in a search engine's natural (unpaid or organic) search results. In general, the higher ranked on the search results page, and more frequently a site appears in the search results list, the more visitors it will receive from the search engine's users. Being indexed by search engines like Google, Bing, and Yahoo! is not good enough for businesses. Getting ranked on the most widely used search engines and getting ranked higher than your competitors are what make the difference.
Provide some examples where a sensitivity analysis may be used.
Sensitivity analyses are used for: • Revising models to eliminate too-large sensitivities • Adding details about sensitive variables or scenarios • Obtaining better estimates of sensitive external variables • Altering a real-world system to reduce actual sensitivities • Accepting and using the sensitive (and hence vulnerable) real world, leading to the continuous and close monitoring of actual results
________ is a technique used to detect favorable and unfavorable opinions toward specific products and services using large numbers of textual data sources.
Sentiment Analysis
Which of the following is NOT a disadvantage of a simulation? - An optimal solution cannot be guaranteed, but relatively good ones are generally found. - Simulation software sometimes requires special skills because of the complexity of the formal solution method. - Simulation is often the only DSS modeling method that can readily handle relatively unstructured problems. - Simulation model construction can be a slow and costly process, although newer modeling systems are easier to use than ever.
Simulation is often the only DSS modeling method that can readily handle relatively unstructured problems.
Why are spreadsheet applications so commonly used for decision modeling?
Spreadsheets are often used for this purpose because they are very approachable and easy to use for end users. Spreadsheets have a shallow learning curve that allows basic functions to be learned quickly. Additionally, spreadsheets have evolved over time to include a more robust set of features and functions. These functions can also be augmented through the use of add-ins, many of which are designed with decision support systems in mind.
Why is the Monte Carlo simulation popular for solving business problems?
The Monte Carlo simulation is a probabilistic simulation. It is designed around a model of the decision problem, but the problem does not consider the uncertainty of any of the variables. This allows for a huge number of simulations to be run with random changes within each of the variables. In this way, the model may be solved hundreds or thousands of times before it is completed. These results can then be analyzed for either the dependent or performance variables using statistical distributions. This demonstrates a number of possible solutions, as well as providing information about the manner in which variables will respond under different levels of uncertainty.
In sentiment analysis, which of the following is an implicit opinion?
The customer service I got for my TV was laughable.
List and describe the most common approaches for treating uncertainty.
There are two common approaches to dealing with uncertainty. The first is the optimistic approach and the second is the pessimistic approach. The optimistic approach assumes that the outcomes for all alternatives will be the best possible and then the best of each of those may be selected. Under the pessimistic approach the worst possible outcome is assumed for each alternative and then the best of the worst are selected.
Which of the following is NOT a characteristic displayed by a LP allocation problem? - Each activity in which the resources are used yields a return in terms of the stated goal. - The resources are used in the production of products or services. - There is a single way in which the resources can be used. - The allocation is usually restricted by several limitations and requirements.
There is a single way in which the resources can be used.
Describe the query-specific clustering method as it relates to clustering.
This method employs a hierarchical clustering approach where the most relevant documents to the posed query appear in small tight clusters that are nested in larger clusters containing less similar documents, creating a spectrum of relevance levels among the documents.
Why do many believe that making decisions under uncertainty is more difficult than making decisions under risk?
This opinion is commonly held because making decisions under uncertainty allows for an unlimited number of possible outcomes, yet no understanding of the likelihood of those outcomes. In contrast, decision-making under risk allows for an unlimited number of outcomes, but a known probability of the likelihood of those outcomes.
Articles and auxiliary verbs are assigned little value in text mining and are usually filtered out. T/F
True
Categorization and clustering of documents during text mining differ only in the preselection of categories. T/F
True
Clickstream analysis does not need users to enter their perceptions of the Web site or other feedback directly to be useful in determining their preferences. T/F
True
Current use of sentiment analysis in voice of the customer applications allows companies to change their products or services in real time in response to customer sentiment. T/F
True
In sentiment analysis, it is hard to classify some subjects such as news as good or bad, but easier to classify others, e.g., movie reviews, in the same way. T/F
True
In text mining, if an association between two concepts has 7% support, it means that 7% of the documents had both concepts represented in the same document. T/F
True
In the Tito's Vodka case study, trends in cocktails were studied to create a quarterly recipe for customers. T/F
True
In the Wimbledon case study, designers balanced the needs of mobile and desktop computer users. T/F
True
VIS uses animated computer graphic displays to present the impact of different managerial decisions. T/F
True
________ is mostly driven by sentiment analysis and is a key element of customer experience management initiatives, where the goal is to create an intimate relationship with the customer.
Voice of the customer (VOC)
Search engine optimization (SEO) is a means by which
Web site developers can increase Web site search rankings.
Web site usability may be rated poor if
Web site visitors download few of your offered PDFs and videos.
________ analysis is structured as "What will happen to the solution if an input variable, an assumption, or a parameter value is changed?"
What-if
In the Lotte.com retail case, the company deployed SAS for Customer Experience Analytics to better understand the quality of customer traffic on their Web site, classify order rates, and see which ________ had the most visitors.
channels
________ is a segmentation metric for social networks that measures the strength of the bonds between actors in a social network.
cohesion
This method calculates the values of the inputs necessary to achieve a desired level of an output.
goal seek
In the Mining for Lies case study, a text based deception-detection method used by Fuller and others in 2008 was based on a process known as ________, which relies on elements of data and text mining techniques.
message feature mining
How would you describe information extraction in text mining?
nformation extraction is the identification of key phrases and relationships within text by looking for predefined objects and sequences in text by way of pattern matching.
The ________ approach assumes that the best possible outcome of each alternative will occur and then selects the best of the best.
optimistic
Breaking up a Web page into its components to identify worthy words/terms and indexing them using a set of rules is called
parsing the documents
In the Wimbledon case study, the tournament used data for each match in real time to highlight
significant events
Conventional ________ generally reports statistical results at the end of a set of experiments.
simulation
What types of documents are BEST suited to semantic labeling and aggregation to determine sentiment orientation?
small- to medium-sized documents
What are the three categories of social media analytics technologies and what do they do?
• Descriptive analytics: Uses simple statistics to identify activity characteristics and trends, such as how many followers you have, how many reviews were generated on Facebook, and which channels are being used most often. • Social network analysis: Follows the links between friends, fans, and followers to identify connections of influence as well as the biggest sources of influence. • Advanced analytics: Includes predictive analytics and text analytics that examine the content in online conversations to identify themes, sentiments, and connections that would not be revealed by casual surveillance.