NEW IS 335 Exam 1 UPDATED
A) mathematical models.
24) Intermediate result variables reflect intermediate outcomes in A) mathematical models. B) flowcharts. C) decision trees. D) ROI calculations.
C) risk.
25) When the decision maker must consider several possible outcomes for each alternative, each with a given probability of occurrence, this is decision making under A) certainty. B) uncertainty. C) risk. D) duress.
A) certainty.
26) When the decision maker knows exactly what the outcome of each course of action will be, this is decision making under A) certainty. B) uncertainty. C) risk. D) duress.
Data, Info, Knowledge, Wisdom
understanding relations/understanding patterns/understanding principals
In decision making under uncertainty, it is assumed that complete knowledge is available.
False
t/f Search engine optimization (SEO) techniques play a minor role in a Web site's search ranking because only well-written content matters.
False
Which of the following is NOT a disadvantage of a simulation?
Simulation is often the only DSS modeling method that can readily handle relatively unstructured problems.
Technical & Sourcing Challenges
Abundant technical options Changes in technologies and vendors Integration requirements Knowledge transfer challenges
Which type of question does visual analytics seeks to answer? Why is it happening? What happened yesterday? What is happening today? When did it happen?
Why is it happening?
Spreadsheets use ________ to extend their functionality.
add-ins
MapReduce can be easily understood by skilled programmers due to its procedural nature.
true
Modeling is a key element for prescriptive analytics.
true
Online commerce and communication has created an immense need for forecasting and an abundance of available information for performing it.
true
Satellite data can be used to evaluate the activity at retail locations as a source of alternative data.
true
Result variables are considered independent variables.
False
Spreadsheets are clearly the most popular developer modeling tool.
False
In sentiment analysis, which of the following is an implicit opinion?
the customer service i got for my TV was laughable
List and describe the most common approaches for treating uncertainty.
the most common way for managers to avoid uncertainty is it "assume it away" by gaining more information about the problem
In the research literature case study, the researchers analyzing academic papers extracted information from which source?
the paper abstract
All of the following statements about data mining are true EXCEPT
the process aspect means that data mining should be a one-step process to results.
Traditional data warehouses have not been able to keep up with
the variety and complexity of data
What do voice of the market (VOM) applications of sentiment analysis do?
they examine customer sentiment at the aggregate level
Benefits of the latest visual analytics tools, such as SAS Visual Analytics, include all of the following EXCEPT mobile platforms such as the iPhone are supported by these products. it is easier to spot useful patterns and trends in the data. they explore massive amounts of data in hours, not days. there is less demand on IT departments for reports.
they explore massive amounts of data in hours, not days.
During classification in data mining, a false positive is an occurrence classified as true by the algorithm while being false in reality.
true
For low latency, interactive reports, a data warehouse is preferable to Hadoop.
true
In sentiment analysis, it is hard to classify some subjects such as news as good or bad, but easier to classify others, e.g., movie reviews, in the same way.
true
In text mining, if an association between two concepts has 7% support, it means that 7% of the documents had both concepts represented in the same document.
true
In the opening vignette, the Access Telecom (AT), built a system to better visualize customers who were unhappy before they canceled their service.
true
Regional accents present challenges for natural language processing.
true
Simulation is normally used only when a problem is too complex to be treated using numerical optimization techniques.
true
Simulation is the appearance of reality.
true
The cost of data storage has plummeted recently, making data mining feasible for more firms.
true
The term "Big Data" is relative as it depends on the size of the using organization.
true
When a problem has many attributes that impact the classification of different patterns, decision trees may be a useful approach.
true
t/f Clickstream analysis does not need users to enter their perceptions of the Web site or other feedback directly to be useful in determining their preferences.
true
the cost of data storage has plummeted recently, making data mining feasible fro more firms
true
What is the Hadoop Distributed File System (HDFS) designed to handle?
unstructured and semistructured non-relational data
Sentiment analysis projects require a lexicon for use. If a project in English is undertaken, you must generally make sure to
use an English lexicon appropriate to the project at your discretion.
22) In the Opening Vignette on Sports Analytics, what type of modeling was used to predict offensive tactics? A) heuristics B) heat maps C) cascaded decision trees D) sentiment analysis
B) heat maps
26) Which of the following BEST enables a data warehouse to handle complex queries and scale up to handle many more requests? A) use of the Web by users as a front-end B) parallel processing C) Microsoft Windows D) a larger IT staff
B) parallel processing
37) What type of analytics seeks to recognize what is going on as well as the likely forecast and make decisions to achieve the best performance possible? A) descriptive B) prescriptive C) predictive D) domain
B) prescriptive
29) The competitive imperatives for BI include all of the following EXCEPT A) right information B) right user C) right time D) right place
B) right user
35) Contextual metadata for a dashboard includes all the following EXCEPT A) whether any high-value transactions that would skew the overall trends were rejected as a part of the loading process. B) which operating system is running the dashboard server software. C) whether the dashboard is presenting "fresh" or "stale" information. D) when the data warehouse was last refreshed.
B) which operating system is running the dashboard server software.
Three Categories of Features
Benchmarking, Intellegence, Convenience
BI is an Entry Level of
Big Data
A decision table shows the relationships of the problem graphically and can handle complex situations in a compact form.
False
A model builder makes predictions and assumptions regarding input data, many of which deal with the assessment of certain futures.
False
All quantitative models are typically made up of six basic components.
False
Business analysis is the monitoring, scanning, and interpretation of collected environmental information.
False
VIS uses animated computer graphic displays to present the impact of different managerial decisions.
True
A data mining study is specific to addressing a well-defined business task, and different business tasks require
different sets of data
All of the following are challenges associated with natural language processing EXCEPT
dividing up a text into individual words in English.
Since little can be done about visitor Web site abandonment rates, organizations have to focus their efforts on increasing the number of new visitors.
false
Spreadsheets are clearly the most popular developer modeling tool.
false
Spreadsheets include all possible tools needed to deploy a custom DSS.
false
Text analytics is the subset of text mining that handles information retrieval and extraction, plus data mining.
false
data mining can be very useful in detecting patterns such as credit card fraud, but is of little help in improving sales
false
statistics and data mining both look look for data sets that are as large as possible
false
t/f Companies understand that when their product goes "viral," the content of the online conversations about their product does not matter, only the volume of conversations.
false
This method calculates the values of the inputs necessary to achieve a desired level of an output.
goal seek
This method calculates the values of the inputs necessary to generate a zero profit outcome.
goal seek break-even
________ is performed by indicating a target cell, its desired value, and a changing cell.
goal seeking
Many quantitative models of decision theory are based on comparing a single measure of effectiveness, generally some form of utility to the decision maker.
False
Simulations are an experimental, expensive, error-prone method for gaining insight into complex decision-making situations.
False
Spreadsheets include all possible tools needed to deploy a custom DSS.
False
________ is performed by indicating a target cell, its desired value, and a changing cell.
Goal seeking
Presentation Capability Technologies
Goal: Maximize learning and decision making
In the Tito's Vodka case study, trends in cocktails were studied to create a quarterly recipe for customers.
true
(Input)Organizational Memory (small data)-> (output)Information Integration (Big data)
(OM=) Historical information and explicit knowledge accumulated over time (mainly structured and internal) ->Synthesized information about the past and present (structured and unstructured, external and internal)
A) static
39) What type of VIM models display a visual image of the result of one decision alternative at a time? A) static B) dynamic C) DSS D) VIS
What is Big Data's relationship to the cloud?
Amazon and Google have working Hadoop cloud offerings.
25) In what decade did disjointed information systems begin to be integrated? A) 1970s B) 1980s C) 1990s D) 2000s
B) 1980s
36) Today, many vendors offer diversified tools, some of which are completely preprogrammed (called shells). How are these shells utilized? A) They are used for customization of BI solutions. B) All a user needs to do is insert the numbers. C) The shell provides a secure environment for the organization's BI data. D) They host an enterprise data warehouse that can assist in decision making.
B) All a user needs to do is insert the numbers.
39) How does the use of cloud computing affect the scalability of a data warehouse? A) Cloud computing vendors bring as much hardware as needed to users' offices. B) Hardware resources are dynamically allocated as use increases. C) Cloud vendors are mostly based overseas where the cost of labor is low. D) Cloud computing has little effect on a data warehouse's scalability.
B) Hardware resources are dynamically allocated as use increases.
36) What is Six Sigma? A) a letter in the Greek alphabet that statisticians use to measure process variability B) a methodology aimed at reducing the number of defects in a business process C) a methodology aimed at reducing the amount of variability in a business process D) a methodology aimed at measuring the amount of variability in a business process
B) a methodology aimed at reducing the number of defects in a business process
You are tasked with accumulating survey data on a web page and are responsible for it being free from dirty data once you close the survey and get the data to the researching team. Which is the best way to handle the possibility of dirty data?
Build a website that validates data as the survey participant takes the survey.
26) Relational databases began to be used in the A) 1960s. B) 1970s. C) 1980s. D) 1990s.
C) 1980s.
31) Online transaction processing (OLTP) systems handle a company's routine ongoing business. In contrast, a data warehouse is typically A) the end result of BI processes and operations. B) a repository of actionable intelligence obtained from a data mart. C) a distinct system that provides storage for data that will be made use of in analysis. D) an integral subsystem of an online analytical processing (OLAP) system.
C) a distinct system that provides storage for data that will be made use of in analysis.
28) Which of the following is an umbrella term that combines architectures, tools, databases, analytical tools, applications, and methodologies? A) MIS B) DSS C) ERP D) BI
D) BI
39) Which of the following statements about Big Data is true? A) Data chunks are stored in different locations on one computer. B) Hadoop is a type of processor used to process Big Data applications. C) MapReduce is a storage filing system. D) Pure Big Data systems do not involve fault tolerance.
D) Pure Big Data systems do not involve fault tolerance.
34) BI applications must be integrated with A) databases. B) legacy systems. C) enterprise systems. D) all of these
D) all of these
37) This measure of central tendency is the sum of all the values/observations divided by the number of observations in the data set. A) dispersion B) mode C) median D) arithmetic mean
D) arithmetic mean
Four Contributions of BI
Dissemination of user-friendly, real-time information Creation of new knowledge based on the past Responsive and anticipative decisions Improved planning for the future
ERP-Conflicts
ERP do not drive innovation Tie between ERP & BI is that a standardized integrated enterprise infrastructure creates better opportunities for the organization to be more agile and adopt innovation ERP focuses on commoditization Standardized infrastructure can focus on using BI to respond with agility to environmental signals
Data Modeling Techniques
Entity-Relational (ER) Modeling Corporate Information Factory (CIF) Dimensional Modeling Package Approach
Classification of Knowledge Repositories
Incident Report Databases Alert Systems Best Practice Databases Lessons Learned Systems (LLS) Expertise Locator Systems
Business Intelligence (BI)
Inputs: Data Information Output: Information presented in a friendly fashion New knowledge or insight
DW-Vendors
Oracle NCR Teradata Open Source Versions like MySQL
Basis of ch 3
Store data/ Make data approachable and useful
Insight Creation Capability
The ability to develop new insights and use them in the short-term or long-term to make better decisions
Information Integration Capability
The ability to link structured and unstructured data from a variety of sources Ex your info is collected at a Casino (also collect competitors public information).
Organizational Memory Capability
The ability to store information (data) and knowledge. Later data will be separated by structured/unstructured data
Presentation Capability
The ability to use appropriate reporting and balanced scorecards tools, and thereby make BI more valuable to users. ex: Presenting (selling) differently to different people ex: the CEO vs CFO
In sentiment analysis, which of the following is an implicit opinion?
The customer service I got for my TV was laughable.
Differentiation of Knowledge Repositories
The differences among the knowledge repositories is based on:Content origin Application Result Orientation
All of the following statements about data mining are true EXCEPT:
The ideas behind it are relatively new.
All of the following statements about data mining are true EXCEPT: Group of answer choices The term is relatively new. Its techniques have their roots in traditional statistical analysis and artificial intelligence. The ideas behind it are relatively new. Intense, global competition make its application more important.
The ideas behind it are relatively new.
Which of the following is NOT a characteristic displayed by a LP allocation problem?
The problem is not bound by constraints.
Which of the following is NOT a characteristic displayed by a LP allocation problem?
There is a single way in which the resources can be used.
How are linear programming models vulnerable when used in complex situation?
They are vulnerable because an infinite number of solutions exist and in complex situations it can be hard to determine the optimal solution.
What do voice of the market (VOM) applications of sentiment analysis do?
They examine customer sentiment at the aggregate level.
What is one major way in which Web-based social media differs from traditional publishing media?
They have different costs to own and operate.
Info integration + insight creation ->data
intellegence
Provide some examples where a sensitivity analysis may be used.
you're planning for a business trip, you might consider the cost of rental car and driving versus commercial airfare. However, what if the cost of gasoline goes up between now and then? With the rising cost of fuel do airfare go up also? These factors could affect your costs and your ultimate decision. By using sensitivity analysis, you can explore various scenarios and make better decisions as a result.
Search engines are only used in the context of the World Wide Web (WWW).
false
List and briefly discuss the major components of a quantitative model.
1. Uncontrollable Variables 2. Decision Variables: alternative courses of action 3. Intermediate Variables: intermediate outcomes in the mathematical model 4. Result Variables: reflect effectiveness of a system
32) The very design that makes an OLTP system efficient for transaction processing makes it inefficient for A) end-user ad hoc reports, queries, and analysis. B) transaction processing systems that constantly update operational databases. C) the collection of reputable sources of intelligence. D) transactions such as ATM withdrawals, where we need to reduce a bank balance accordingly.
A) end-user ad hoc reports, queries, and analysis.
33) What is the management feature of a dashboard? A) operational data that identify what actions to take to resolve a problem B) summarized dimensional data to analyze the root cause of problems C) summarized dimensional data to monitor key performance metrics D) graphical, abstracted data to monitor key performance metrics
A) operational data that identify what actions to take to resolve a problem
38) This measure of dispersion is calculated by simply taking the square root of the variations. A) standard deviation B) range C) variance D) arithmetic mean
A) standard deviation
Why are spreadsheet applications so commonly used for decision modeling?
Because it is easy to manipulate the data without having to learn a coding language like SQL.
Why is there a trend to developing and using cloud-based tools for modeling?
Because it simplifies the application of many models to real-world problems. Using the cloud, data is more easily accessible
Why is the Monte Carlo simulation popular for solving business problems?
Because you don't have to consider the uncertainty of any variables.
39) This plot is a graphical illustration of several descriptive statistics about a given data set. A) pie chart B) bar graph C) box-and-whiskers plot D) kurtosis
C) box-and-whiskers plot
35) What has caused the growth of the demand for instant, on-demand access to dispersed information? A) the increasing divide between users who focus on the strategic level and those who are more oriented to the tactical level B) the need to create a database infrastructure that is always online and contains all the information from the OLTP systems C) the more pressing need to close the gap between the operational data and strategic objectives D) the fact that BI cannot simply be a technical exercise for the information systems department
C) the more pressing need to close the gap between the operational data and strategic objectives
40) Big Data often involves a form of distributed storage and processing using Hadoop and MapReduce. One reason for this is A) centralized storage creates too many vulnerabilities. B) the "Big" in Big Data necessitates over 10,000 processing nodes. C) the processing power needed for the centralized model would overload a single computer. D) Big Data systems have to match the geographical spread of social media.
C) the processing power needed for the centralized model would overload a single computer.
36) Dashboards can be presented at all the following levels EXCEPT A) the visual dashboard level. B) the static report level. C) the visual cube level. D) the self-service cube level.
C) the visual cube level.
Which data mining process/methodology is thought to be the most comprehensive, according to kdnuggets.com rankings?
CRISP-DM
Which data mining process/methodology is thought to be the most comprehensive, according to kdnuggets.com rankings? Group of answer choices SEMMA proprietary organizational methodologies KDD Process CRISP-DM
CRISP-DM
Presentation ->data
Convince
35) When querying a dimensional database, a user went from summarized data to its underlying details. The function that served this purpose is A) dice. B) slice. C) roll-up. D) drill down.
D) drill down.
22) Key performance indicators (KPIs) are metrics typically used to measure A) database responsiveness. B) qualitative feedback. C) external results. D) internal results.
D) internal results.
Data Warehouse-Four levels
DW is an architecture that describes the atomic level in the enterprise's data model, which consists of four levels: Operational Level Data Warehouse Level Data Mart or Departmental Level (sub data base) Individual Data Level (sub data base)
Characteristics of a Mature Data Warehouse
Data Architecture Stability of the production environment Warehouse staff Users Impact on users' skills and jobs Applications Cost & Benefits Organizational impact
ERP collects Data -->
Data Warehouse Manages it
What do we want?
Insight creation. How to sell your creation is our goal.
Which of the following is a data mining myth? Data mining is a multistep process that requires deliberate, proactive design and use. Data mining requires a separate, dedicated database. The current state-of-the-art is ready to go for almost any business. Newer Web-based tools enable managers of all educational levels to do data mining.
Data mining requires a separate, dedicated database.
Study Inputs + Outputs of the four BIC (slide 17)
Is the basis of the whole class
In the Target case study, why did Target send a teen maternity ads?
Target's analytic model suggested she was pregnant based on her buying habits.
A decision made under risk is also known as a probabilistic or stochastic decision-making situation.
True
Decision situations that involve a finite and usually not too large number of alternatives are modeled through an approach called decision analysis.
True
Every LP model has some internal intermediate variables that are not explicitly stated.
True
In the School District of Philadelphia case, Excel and an add-in was used to evaluate different vendor options.
True
Modeling is a key element for prescriptive analytics.
True
Online commerce and communication has created an immense need for forecasting and an abundance of available information for performing it.
True
Simulation is normally used only when a problem is too complex to be treated using numerical optimization techniques.
True
Simulation is the appearance of reality.
True
The pessimistic approach assumes that the worst possible outcome for each alternative will occur and selects the best of these.
True
In a Hadoop "stack," what is a slave node?
a node where data is stored and processed
What does Web content mining involve?
analyzing the unstructured content of Web pages
Which broad area of data mining applications partitions a collection of objects into natural groupings with similar features?
clustering
How to transform a businuess
data->bi->product (all moved forward by project teams)
All quantitative models are typically made up of six basic components.
false
Business analysis is the monitoring, scanning, and interpretation of collected environmental information.
false
ERP systems
implemented to bring enterprise infrastructure to Y2K compliance
In the Influence Health case study, what was the goal of the system? Group of answer choices locating clinic patients understanding follow-up care decreasing operational costs increasing service use
increasing service use
What does the scalability of a data mining method refer to?
its ability to construct a prediction model efficiently given a large amount of data
What does the scalability of a data mining method refer to? its ability to predict the outcome of a previously unknown data set accurately its speed of computation and computational costs in using the mode its ability to construct a prediction model efficiently given a large amount of data its ability to overcome noisy data to make somewhat accurate predictions
its ability to construct a prediction model efficiently given a large amount of data
The most common simulation method for business decision problems is the ________ simulation.
monte carlo
C) cognitive map.
21) A more general form of an influence diagram is called a(n) A) forecast. B) environmental scan. C) cognitive map. D) static model.
B) influence diagram
22) A(n) ________ is a graphical representation of a model. A) multidimensional analysis B) influence diagram C) OLAP model D) Whisker plot
C) classes
23) Which of the following is NOT a component of a quantitative model? A) result variables B) decision variables C) classes D) parameters
B) dynamic
27) A(n) ________ spreadsheet model represents behavior over time. A) static B) dynamic C) looped D) add-in
D) pivot tables.
28) Important spreadsheet features for modeling include all of the following EXCEPT A) what-if analysis. B) goal seeking. C) macros. D) pivot tables.
D) The problem is not bound by constraints.
29) Which of the following is NOT a characteristic displayed by a LP allocation problem? A) A limited quantity of economic resources is available for allocation. B) The resources are used in the production of products or services. C) There are two or more ways in which the resources can be used. D) The problem is not bound by constraints.
C) There is a single way in which the resources can be used.
30) Which of the following is NOT a characteristic displayed by a LP allocation problem? A) Each activity in which the resources are used yields a return in terms of the stated goal. B) The resources are used in the production of products or services. C) There is a single way in which the resources can be used. D) The allocation is usually restricted by several limitations and requirements.
D) All data are unknown with decision making under uncertainty.
31) Which of the following is NOT an assumption used by a LP allocation problem? A) Returns from different allocations can be compared. B) The return from any allocation is independent of other allocations. C) The total return is the sum of the returns yielded by the different activities. D) All data are unknown with decision making under uncertainty.
C) Total returns cannot be compared.
32) Which of the following is NOT an assumption used by a LP allocation problem? A) The resources are to be used in the most economical manner. B) The return from any allocation is independent of other allocations. C) Total returns cannot be compared. D) All data are known with certainty.
A) goal seek
33) This method calculates the values of the inputs necessary to achieve a desired level of an output. A) goal seek B) what-if C) sensitivity D) LP
A) goal seek
34) This method calculates the values of the inputs necessary to generate a zero profit outcome. A) goal seek B) what-if C) sensitivity D) break-even
B) greatest expected value.
35) The most common method for solving a risk analysis problem is to select the alternative with the A) smallest expected value. B) greatest expected value. C) mean expected value. D) median expected value.
C) many alternatives.
36) A decision tree can be cumbersome if there are A) uncertain results. B) few alternatives. C) many alternatives. D) pre-existing decision tables.
C) Simulation is often the only DSS modeling method that can readily handle relatively unstructured problems.
37) Which of the following is NOT a disadvantage of a simulation? A) An optimal solution cannot be guaranteed, but relatively good ones are generally found. B) Simulation software sometimes requires special skills because of the complexity of the formal solution method. C) Simulation is often the only DSS modeling method that can readily handle relatively unstructured problems. D) Simulation model construction can be a slow and costly process, although newer modeling systems are easier to use than ever.
D) Define the problem, Construct the simulation model, Test and validate the model, Design the experiment, Conduct the experiment, Evaluate the results, Implement the results.
38) Which of the following is the order of simulation methodology? A) Define the problem, Construct the simulation model, Test and validate the model, Design the experiment, Conduct the experiment, Implement the results, Evaluate the results. B) Construct the simulation model, Test and validate the model, Define the problem, Design the experiment, Conduct the experiment, Evaluate the results, Implement the results. C) Define the problem, Construct the simulation model, Test and validate the model, Evaluate the results, Implement the results, Design the experiment, Conduct the experiment. D) Define the problem, Construct the simulation model, Test and validate the model, Design the experiment, Conduct the experiment, Evaluate the results, Implement the results.
D) confidence gap
40) If a simulation result does NOT match the intuition or judgment of the decision maker, what can occur? A) read/write error B) visual distortion C) project failure D) confidence gap
The Eckerson survey of 2002 estimated the total cost (to the US yearly economy) of dirty data to be approximately:
600 billion usd
data warehouse
A single logical repository for an organizations data
30) ________ is an evolving tool space that promises real-time data integration from a variety of sources, such as relational databases, Web services, and multidimensional databases. A) Enterprise information integration (EII) B) Enterprise application integration (EAI) C) Extraction, transformation, and load (ETL) D) None of these
A) Enterprise information integration (EII)
30) Which type of question does visual analytics seeks to answer? A) Why is it happening? B) What happened yesterday? C) What is happening today? D) When did it happen?
A) Why is it happening?
34) When representing data in a data warehouse, using several dimension tables that are each connected only to a fact table means you are using which warehouse structure? A) star schema B) snowflake schema C) relational schema D) dimensional schema
A) star schema
22) Operational or transaction databases are product oriented, handling transactions that update the database. In contrast, data warehouses are A) subject-oriented and nonvolatile. B) product-oriented and nonvolatile. C) product-oriented and volatile. D) subject-oriented and volatile.
A) subject-oriented and nonvolatile.
29) Which approach to data warehouse integration focuses more on sharing process functionality than data across systems? A) extraction, transformation, and load B) enterprise application integration C) enterprise information integration D) enterprise function integration
B) enterprise application integration
intelligence
Ability to search and utilize data across disparate sources
Which of the following is NOT an assumption used by a LP allocation problem?
All data are unknown with decision making under uncertainty.
Knowledge Repositories
Also known as knowledge sharing systems Include technologies that support:Document management systemsDigital content management systemsEnterprise content management systems Web content management systems
29) Which type of visualization tool can be very helpful when a data set contains location data? A) bar chart B) geographic map C) highlight table D) tree map
B) geographic map
Simulations are an experimental, expensive, error-prone method for gaining insight into complex decision-making situations.
false
33) How are enterprise resources planning (ERP) systems related to supply chain management (SCM) systems? A) different terms for the same system B) complementary systems C) mutually exclusive systems D) none of the above; these systems never interface
B) complementary systems
40) This technique makes no a priori assumption of whether one variable is dependent on the other(s) and is not concerned with the relationship between variables; instead it gives an estimate on the degree of association between the variables. A) regression B) correlation C) means test D) multiple regression
B) correlation
31) When you tell a story in a presentation, all of the following are true EXCEPT A) a story should make sense and order out of a lot of background noise. B) a well-told story should have no need for subsequent discussion. C) stories and their lessons should be easy to remember. D) the outcome and reasons for it should be clear at the end of your story.
B) a well-told story should have no need for subsequent discussion.
23) Kaplan and Norton developed a report that presents an integrated view of success in the organization called A) metric management reports. B) balanced scorecard-type reports. C) dashboard-type reports. D) visual reports.
B) balanced scorecard-type reports.
38) A large storage location that can hold vast quantities of data (mostly unstructured) in its native/raw format for future/potential analytics consumption is referred to as a(n) A) extended ASP. B) data cloud. C) data lake. D) relational database.
C) data lake.
23) Business applications have moved from transaction processing and monitoring to other activities. Which of the following is NOT one of those activities? A) problem analysis B) solution applications C) data monitoring D) mobile access
C) data monitoring
21) Which characteristic of data means that all the required data elements are included in the data set? A) data source reliability B) data accessibility C) data richness D) data granularity
C) data richness
27) The need for more versatile reporting than what was available in 1980s era ERP systems led to the development of what type of system? A) management information systems B) relational databases C) executive information systems D) data warehouses
C) executive information systems
33) All of the following are benefits of hosted data warehouses EXCEPT A) smaller upfront investment. B) better quality hardware. C) greater control of data. D) frees up in-house systems.
C) greater control of data.
28) Which data warehouse architecture uses a normalized relational warehouse that feeds multiple data marts? A) independent data marts architecture B) centralized data warehouse architecture C) hub-and-spoke data warehouse architecture D) federated architecture
C) hub-and-spoke data warehouse architecture
23) Which kind of data warehouse is created separately from the enterprise data warehouse by a department and not reliant on it for updates? A) sectional data mart B) public data mart C) independent data mart D) volatile data mart
C) independent data mart
26) The Internet emerged as a new medium for visualization and brought all the following EXCEPT A) worldwide digital distribution of visualization. B) immersive environments for consuming data. C) new forms of computation of business logic. D) new graphics displays through PC displays.
C) new forms of computation of business logic.
28) Which type of visualization tool can be very helpful when the intention is to show relative proportions of dollars per department allocated by a university administration? A) heat map B) bullet C) pie chart D) bubble chart
C) pie chart
38) What type of analytics seeks to determine what is likely to happen in the future? A) descriptive B) prescriptive C) predictive D) domain
C) predictive
30) Which of the following is NOT an example of transaction processing? A) ATM withdrawal B) bank deposit C) sales report D) cash register scans
C) sales report
37) Real-time data warehousing can be used to support the highest level of decision making sophistication and power. The major feature that enables this in relation to handling the data is A) country of (data) origin. B) nature of the data. C) speed of data transfer. D) source of the data.
C) speed of data transfer.
25) Which of the following is LEAST related to data/information visualization? A) information graphics B) scientific visualization C) statistical graphics D) graphic artwork
C) statistical graphics
32) Benefits of the latest visual analytics tools, such as SAS Visual Analytics, include all of the following EXCEPT A) mobile platforms such as the iPhone are supported by these products. B) it is easier to spot useful patterns and trends in the data. C) they explore massive amounts of data in hours, not days. D) there is less demand on IT departments for reports.
C) they explore massive amounts of data in hours, not days.
25) A Web client that connects to a Web server, which is in turn connected to a BI application server, is reflective of a A) one-tier architecture. B) two-tier architecture. C) three-tier architecture. D) four-tier architecture.
C) three-tier architecture.
Why Presentation Capability?
Content and format needs differ ex: Role (ceo vs cfo), Task (ex security vs financial need), Personal Preference
Elements of the Enterprise Architecture
Core business processes Sharing of data driving core processes Key linking and automation technologies Key customers
convenience
Customization and connectivity
21) Why is a performance management system superior to a performance measurement system? A) because performance measurement systems are only in their infancy B) because measurement automatically leads to problem solution C) because performance management systems cost more D) because measurement alone has little use without action
D) because measurement alone has little use without action
27) Which kind of chart is described as an enhanced version of a scatter plot? A) heat map B) bullet C) pie chart D) bubble chart
D) bubble chart
31) In which stage of extraction, transformation, and load (ETL) into a data warehouse are anomalies detected and corrected? A) transformation B) extraction C) load D) cleanse
D) cleanse
24) Which characteristic of data requires that the variables and data values be defined at the lowest (or as low as required) level of detail for the intended use of the data? A) data source reliability B) data accessibility C) data richness D) data granularity
D) data granularity
34) What is the fundamental challenge of dashboard design? A) ensuring that users across the organization have access to it B) ensuring that the organization has the appropriate hardware onsite to support it C) ensuring that the organization has access to the latest Web browsers D) ensuring that the required information is shown clearly on a single screen
D) ensuring that the required information is shown clearly on a single screen
27) Which data warehouse architecture uses metadata from existing data warehouses to create a hybrid logical data warehouse comprised of data from the other warehouses? A) independent data marts architecture B) centralized data warehouse architecture C) hub-and-spoke data warehouse architecture D) federated architecture
D) federated architecture
32) Data warehouses provide direct and indirect benefits to organizations. Which of the following is an indirect benefit of data warehouses? A) better and more timely information B) extensive new analyses performed by users C) simplified access to data D) improved customer service
D) improved customer service
40) All of the following are true about in-database processing technology EXCEPT A) it pushes the algorithms to where the data is. B) it makes the response to queries much faster than conventional databases. C) it is often used for apps like credit card fraud detection and investment risk management. D) it is the same as in-memory storage technology.
D) it is the same as in-memory storage technology.
24) Which of the following developments is NOT contributing to facilitating growth of decision support and analytics? A) collaboration technologies B) Big Data C) knowledge management systems D) locally concentrated workforces
D) locally concentrated workforces
24) Oper marts are created when operational data needs to be analyzed A) linearly. B) in a dashboard. C) unidimensionally. D) multidimensionally.
D) multidimensionally.
21) In the Opening Vignette on Sports Analytics, what was adjusted to drive one-time ticket sales? A) player selections B) stadium location C) fan tweets D) ticket prices
D) ticket prices
Which of the following is the order of simulation methodology?
Define the problem, Construct the simulation model, Test and validate the model, Design the experiment, Conduct the experiment, Evaluate the results, Implement the results.
Insight Creation Capability Slide 13
Describe What Happened- whats business revenue ex bad revenue Understand What Happened- whats reason/how to solve it
Data Model
Describes how data is represented and accessed (i.e. provides definition and format of data)
Enterprise Systems-Middleware
Enterprise Application Integration (EAI) EAI can parse, duplicate or transform data from an application to present in an acceptable format EAI deals with data integration with legacy systems There is no need to redefine business practices
Data Warehouse
IBM researchers Devlin & Murphy first described the concept of data warehouses (DW) Copy of transaction data structured for querying and reporting It is a prerequisite to BI since it helps the organization to obtain value from its data sources by preparing and storing the enterprise data into a repository designed to support decision making.
Three Benefits of BI to Organizational Success
Improvement in operational performance Improvement in customer service Identification of new opportunities
Characteristics of Efficient Knowledge Repositories
Knowledge owners: Knowledge sharing Conditions of sharing Rewarded for knowledge sharing Knowledge seekers: Explore possibilities for searching and ranking Applicability of explicit knowledge Knowledge sharing and learning
Why do many believe that making decisions under uncertainty is more difficult than making decisions under risk?
In decision making under uncertainty, decision makers have no information at all about the various outcomes. They do not know the likelihood (or probability) that a specific outcome will occur.
Knowledge Management (KM)
Inputs: Information Knowledge Output: Creation of new knowledge Conversion to another form of knowledge Application of knowledge in making a decision
(ERP)-Benefits
Integrate business processes across the enterprise Single database for the whole enterprise Access to real time transnational data Elimination of costly stand alone legacy systems Elimination of complexities Provide the infrastructure for organization to improve management of order fulfillment processes Integration of different departments working in the organization
What does advanced analytics for social media do?
It examines the content of online conversations.
ERP-Implementation Problems
Large monetary investments Organizational change Technical challenges Operational problems Integrating ERP into existing legacy systems Code customization may increase complexity MUST BE CHANGED BASED ON EACH INDUSTRY (WYNN VS UNLV SIZE & DATA) 73% of all erp implementations fail. 50% of the implementations run Most important issue is how to implement/install
ERP Control the aspects of:
Manufacturing Procurement of materials Delivery and inventory control
________, like data, must be managed to maintain their integrity, and thus their applicability.
Models
The most common simulation method for business decision problems is the ________ simulation.
Monte Carlo
Why Information Integration Capability?
Mutually disconnected, incompatible transactional systems exist within an organization Data exists outside of transactional systems such as e-mail, audio and video files, etc. A lot of relevant external data is available, such as web sites, industry reports, expert opinions More complex decision making due to an increase in the diversity of factors to consider we need to combine so we can take the data so we can make insight creation
Insight Creation -> Present-> Users
New insights and information to support decision making in a real-time fashion->Information presented in user-friendly fashion and in ways most appropriate for each user
Four key capabilities of BI solutions
Organizational Memory Capability Information Integration Insight Creation Presentation
Role + Task + Prefrence ->Content->Format =
Presentation
Business Intelligence
Presents information to individuals with little technical expertise
Data (analyze)->info (interpret)->knowledge (apply)
Raw facts (may or may not be correct)->subset of data/ has been processed to have context->justified belief about relationships between concepts
Enterprise Resource Planning (ERP)
Refers to transactional systems that capture organizational memory related to all business processes that the organization engages inExample: Order to cash captures all the transactions in an organization
Benchmarking
Relative to competition and industry trends
(ERP)- Vendors
SAP (www.sap.com) Oracle (www.oracle.com) Sage (www.sagenorthamerica.com) Microsoft Dynamics (www.microsoft.com/dynamics) OpenERP, GNU Enterprise, WebERP, etc.
________ analysis attempts to assess the impact of a change in the input data or parameters on the proposed solution.
Sensitivity
Organizational Memory
Storage of information in such a form that it can be later accessed and used for BI Relates to corporate memory, knowledge repository and institutional memory
Data Types
Structured Data (MGM) Unstructured Data (email/text)
Data Warehouse- Characteristics
Subject Oriented (ex: All VIP guests-- find out their consumption trend) Integrated using: Operational databases Data archives Legacy databases External data Nonvolatile Time-Variant
In the Target case study, why did Target send a teen maternity ads? Group of answer choices. Target's analytic model confused her with an older woman with a similar name. Target was sending ads to all women in a particular neighborhood. Target's analytic model suggested she was pregnant based on her buying habits. Target was using a special promotion that targeted all teens in her geographical area.
Target's analytic model suggested she was pregnant based on her buying habits.
Operating Models-Based on Standardization and Integration
The Diversification Model (low standardization, low integration ) The Coordination Model (low standardization, high integration) The Replication Model (high standardization, low integration) The Unification Model (high standardization, high integration)
ERP-Implementation Success
Top management commitment Strong project management Team member skills Team member motivation and dedication Effective communication Effective change management. major function of ERP- For BI--Collects and Stores Data--Goes to Data warehouse to make it useable
Which of the following is NOT an assumption used by a LP allocation problem?
Total returns cannot be compared.
Enterprise Architecture
The IT unit typically defines four levels of architecture below the enterprise architecture: The Business Process Architecture The Data or Information Architecture The Application Architecture The Technology Architecture
Why is the Monte Carlo simulation popular for solving business problems?
The Monte Carlo simulation is a probabilistic simulation. It is designed around a model of the decision problem, but the problem does not consider the uncertainty of any of the variables. This allows for a huge number of simulations to be run with random changes within each of the variables. In this way, the model may be solved hundreds or thousands of times before it is completed. These results can then be analyzed for either the dependent or performance variables using statistical distributions. This demonstrates a number of possible solutions, as well as providing information about the manner in which variables will respond under different levels of uncertainty.
data mining
The process of discovering hidden patterns from data stored electronically (ex. in a data warehouse)
The ________ approach can be used in conjunction with artificial intelligence.
VIM
Which of the following statements about Web site conversion statistics is FALSE?
Visitors who begin a purchase on most Web sites must complete it.
Search engine optimization (SEO) is a means by which
Web site developers can increase Web site search rankings.
Web site usability may be rated poor if
Web site visitors download few of your offered PDFs and videos.
________ analysis is structured as "What will happen to the solution if an input variable, an assumption, or a parameter value is changed?"
What-if
Which broad area of data mining applications analyzes data, forming rules to distinguish between defined classes? associations visualization classification clustering
classification
The most common method for solving a risk analysis problem is to select the alternative with the A) smallest expected value. B) greatest expected value. C) mean expected value. D) median expected value.
b
When the decision maker knows exactly what the outcome of each course of action will be, this is decision making under A) certainty. B) uncertainty. C) risk. D) duress.
a
In text analysis, what is a lexicon?
a catalog of words, their synonyms, and their meanings
In the opening vignette, the architectural system that supported Watson used all the following elements EXCEPT
a core engine that could operate seamlessly in another domain without changes.
When you tell a story in a presentation, all of the following are true EXCEPT a story should make sense and order out of a lot of background noise. a well-told story should have no need for subsequent discussion. stories and their lessons should be easy to remember. the outcome and reasons for it should be clear at the end of your story.
a well-told story should have no need for subsequent discussion.
The components of a quantitative model are linked by ________ expressions.
algebraic
Natural language processing (NLP) is associated with which of the following areas? a) text mining b) artificial intelligence c) computational linguistics d) all of these
all of these
Natural language processing (NLP) is associated with which of the following areas?
all of these -text mining -artificial intelligence -computational linguistics
A company/organization can encounter dirty data in the form of
all of these -invalid mailing address -invalid email address -duplicated data
Risk ________ is a decision-making method that analyzes the risk (based on assumed known probabilities) associated with different alternatives.
analysis
This measure of central tendency is the sum of all the values/observations divided by the number of observations in the data set. dispersion mode median arithmetic mean
arithmetic mean
A(n) ________ is a graphical representation of a model. A) multidimensional analysis B) influence diagram C) OLAP model D) Whisker plot
b
A(n) ________ spreadsheet model represents behavior over time. A) static B) dynamic C) looped D) add-in
b
Why Insight Creation Capability?
because every company must create innovations to survive
What is the main reason parallel processing is sometimes used for data mining?
because of the massive data amounts and search efforts involved
Organizational memory ->
benchmarking
When the decision maker must consider several possible outcomes for each alternative, each with a given probability of occurrence, this is decision making under A) certainty. B) uncertainty. C) risk. D) duress.
c
Which of the following is NOT a component of a quantitative model? A) result variables B) decision variables C) classes D) parameters
c
Which of the following is NOT a disadvantage of a simulation? A) An optimal solution cannot be guaranteed, but relatively good ones are generally found. B) Simulation software sometimes requires special skills because of the complexity of the formal solution method. C) Simulation is often the only DSS modeling method that can readily handle relatively unstructured problems. D) Simulation model construction can be a slow and costly process, although newer modeling systems are easier to use than ever.
c
Which broad area of data mining applications analyzes data, forming rules to distinguish between defined classes?
classification
Transactional Systems
capture all the relevant information for one accounting period (i.e. month/quarter/year/etc.)
In text mining, tokenizing is the process of
categorizing a block of text in a sentence.
When the decision maker knows exactly what the outcome of each course of action will be, this is decision making under
certainty.
Which of the following is NOT a component of a quantitative model?
classes
If a simulation result does NOT match the intuition or judgment of the decision maker, what can occur?
confidence gap
Multiple goals is a decision situation in which alternatives are evaluated with several, sometimes ________, goals.
conflicting
This technique makes no a priori assumption of whether one variable is dependent on the other(s) and is not concerned with the relationship between variables; instead it gives an estimate on the degree of association between the variables. regression correlation means test multiple regression
correlation
knowledge management
creating new idea. ex: touch screen from apple
Result variables are considered independent variables.
false
Important spreadsheet features for modeling include all of the following EXCEPT A) what-if analysis. B) goal seeking. C) macros. D) pivot tables.
d
Which of the following is the order of simulation methodology? A) Define the problem, Construct the simulation model, Test and validate the model, Design the experiment, Conduct the experiment, Implement the results, Evaluate the results. B) Construct the simulation model, Test and validate the model, Define the problem, Design the experiment, Conduct the experiment, Evaluate the results, Implement the results. C) Define the problem, Construct the simulation model, Test and validate the model, Evaluate the results, Implement the results, Design the experiment, Conduct the experiment. D) Define the problem, Construct the simulation model, Test and validate the model, Design the experiment, Conduct the experiment, Evaluate the results, Implement the results.
d
Which characteristic of data requires that the variables and data values be defined at the lowest (or as low as required) level of detail for the intended use of the data? Group of answer choices data source reliability data accessibility data richness data granularity
data granularity
Which characteristic of data means that all the required data elements are included in the data set? data source reliability data accessibility data richness data granularity
data richness
Every LP model is composed of ________ variables whose values are unknown and are searched for.
decision
In the Analyzing Disease Patterns from an Electronic Medical Records Data Warehouse case study, what was the analytic goal?
determine differences in rates of disease in urban and rural populations
A(n) ________ model can be constructed under assumed environments of certainty.
dynamic
A(n) ________ spreadsheet model represents behavior over time.
dynamic
Search engine optimization (SEO) techniques play a minor role in a Web site's search ranking because only well-written content matters.
false
Big Data simplifies data governance issues (like who owns the data or who is in charge of it), especially for global firms.
false
Companies understand that when their product goes "viral," the content of the online conversations about their product does not matter, only the volume of conversations.
false
Data mining requires specialized data analysts to ask ad hoc questions and obtain answers quickly from the system.
false
Data that is collected, stored, and analyzed in data mining is often private and personal. There is no way to maintain individuals' privacy other than being very careful about physical data security.
false
In a dataset where all values on an observation are supposed to be populated you encounter several which are empty (NULL). It is always best to replace these NULL values with the average of that column of data.
false
In decision making under uncertainty, it is assumed that complete knowledge is available.
false
In sentiment analysis, sentiment suggests a transient, temporary opinion reflective of one's feelings.
false
In the Dell cases study, the largest issue was how to properly spend the online marketing budget.
false
In the Miami-Dade Police Department case study, predictive analytics helped to identify the best schedule for officers in order to pay the least overtime
false
In the Salesforce case study, streaming data is used to identify services that customers use most.
false
In the cancer research case study, data mining algorithms that predict cancer survivability with high predictive power are good replacements for medical professionals
false
In the car insurance case study, text mining was used to identify auto features that caused injuries.
false
In the evolution of social media user engagement, the largest recent change is the growth of creators.
false
K-fold cross-validation is also called sliding estimation.
false
Open-source data mining tools include applications such as IBM SPSS Modeler and Dell Statistica.
false
Ratio data is a type of categorical data.
false
Which type of visualization tool can be very helpful when a data set contains location data? bar chart geographic map highlight table tree map
geographic map
The most common method for solving a risk analysis problem is to select the alternative with the
greatest expected value.
Which Big Data approach promotes efficiency, lower cost, and better performance by processing jobs in a shared, centrally managed pool of IT resources?
grid computing
Understanding which keywords your users enter to reach your Web site through a search engine can help you understand
how well visitors understand your products
Understanding which keywords your users enter to reach your Web site through a search engine can help you understand
how well visitors understand your products.
A(n) ________ is a graphical representation of a model.
influence diagram
Identifying and preventing incorrect claim payments and fraudulent activities falls under which type of data mining applications?
insurance
Identifying and preventing incorrect claim payments and fraudulent activities falls under which type of data mining applications?.. insurance retailing and logistics customer relationship management computer hardware and software
insurance
Key performance indicators (KPIs) are metrics typically used to measure database responsiveness. qualitative feedback. external results. internal results.
internal results.
What does the robustness of a data mining method refer to?
its ability to overcome noisy data to make somewhat accurate predictions
A decision tree can be cumbersome if there are
many alternatives.
Intermediate result variables reflect intermediate outcomes in
mathematical models.
The data field "ethnic group" can be best described as nominal data. interval data. ordinal data. ratio data.
nominal data.
A collection of data is ____ information. A collection of information is not knowledge. A collection of knowledge is not wisdom. A collection of wisdom is not truth
not
In the Twitter case study, how did influential users support their tweets?
objective data
What are the two main types of Web analytics?
off-site and on-site Web analytics
What are the two main types of Web analytics?
off-site and on-site web analytics
Of the available solutions, at least one is the best, in the sense that the degree of goal attainment associated with it is the highest; this is called a(n) ________ solution.
optimal
Of the available solutions, at least one is the best, in the sense that the degree of goal attainment associated with it is the highest; this is called a(n) ________ solution.
optimal solutions
The ________ approach assumes that the best possible outcome of each alternative will occur and then selects the best of the best.
optimistic
Factors that are not under the control of the decision maker but can be fixed, are called ________.
parameters
Breaking up a Web page into its components to identify worthy words/terms and indexing them using a set of rules is called
parsing the documents
Breaking up a Web page into its components to identify worthy words/terms and indexing them using a set of rules is called
parsing the documents.
In ________ simulation, one or more of the independent variables (e.g., the demand in an inventory problem) are subject to chance variation.
probabilistic
Important spreadsheet features for modeling include all of the following EXCEPT
pivot tables.
When viewed as a binary feature, ________ classification is the binary classification task of labeling an opinionated document as expressing either an overall positive or an overall negative opinion.
polarity
Knowledge discovery in data and information;
potential overlap between BI and KM
why bi
provides decision makers with valuable information and knowledge by leveraging a variety of data sources as well as structured and unstructured information
Data Warehouse:
provides the source of data and information for business intelligence (BI) analysis
________ is a technique used to detect favorable and unfavorable opinions toward specific products and services using large numbers of textual data sources.
sentiment analysis
Third party providers of publicly available data sets protect the anonymity of the individuals in the data set primarily by
removing identifiers such as names and social security numbers.
Third party providers of publicly available data sets protect the anonymity of the individuals in the data set primarily by.. asking data users to use the data ethically. leaving in identifiers (e.g., name), but changing other variables. removing identifiers such as names and social security numbers. letting individuals in the data know their data is being accessed.
removing identifiers such as names and social security numbers.
A probabilistic decision-making situation is a decision made under ________.
risk
When the decision maker must consider several possible outcomes for each alternative, each with a given probability of occurrence, this is decision making under
risk.
In a Hadoop "stack," what node periodically replicates and stores data from the Name Node should it fail?
secondary node
________ analysis attempts to assess the impact of a change in the input data or parameters on the proposed solution.
sensitivity
In the Wimbledon case study, the tournament used data for each match in real time to highlight
significant events.
Clustering partitions a collection of things into segments whose members share
similar collection methods
Conventional ________ generally reports statistical results at the end of a set of experiments.
simulation
What types of documents are BEST suited to semantic labeling and aggregation to determine sentiment orientation?
small- to medium-sized documents
ERP Systems
software packages composed of several modules such as human resources, sales, finance, and production, providing a cross-organization integration of data through embedded business processes. (Replaced traditional departments. Now focus on Research and Development. Now employ Knowledge Workers and Project managers for R &D) Manufacturing resource planning or material requirements planning systems ERP is Centralized!
What type of VIM models display a visual image of the result of one decision alternative at a time?
static
A more general form of an influence diagram is called a(n)
static model
Big Data is
the combination of Data Mining, Text Mining and Clouds Computing
A decision made under risk is also known as a probabilistic or stochastic decision-making situation.
true
Articles and auxiliary verbs are assigned little value in text mining and are usually filtered out.
true
Big Data is being driven by the exponential growth, availability, and use of information.
true
Categorization and clustering of documents during text mining differ only in the preselection of categories.
true
Clickstream analysis does not need users to enter their perceptions of the Web site or other feedback directly to be useful in determining their preferences.
true
Current use of sentiment analysis in voice of the customer applications allows companies to change their products or services in real time in response to customer sentiment.
true
Decision situations that involve a finite and usually not too large number of alternatives are modeled through an approach called decision analysis.
true
Data flows can be highly inconsistent, with periodic peaks, making data loads hard to manage. What is this feature of Big Data called?
variability
Identification of a model's variables (e.g., decision, result, uncontrollable) is critical, as are the relationships among the ________.
variables
Selecting the best ________ to work with is a laborious yet important task for companies and government organizations.
vendors
Search engine optimization (SEO) is a means by which
web site developers can increase Web site search rankings.
________ analysis is structured as "What will happen to the solution if an input variable, an assumption, or a parameter value is changed?"
what-if
Contextual metadata for a dashboard includes all the following EXCEPT whether any high-value transactions that would skew the overall trends were rejected as a part of the loading process. which operating system is running the dashboard server software. whether the dashboard is presenting "fresh" or "stale" information. when the data warehouse was last refreshed.
which operating system is running the dashboard server software.